End-to-end resource analysis for quantum interior point methods and portfolio optimization

.


A. Introduction
The practical utility of finding optimal solutions to well-posed optimization problems has been known since the days of antiquity, with Euclid considering the minimal distance between two points using a line.In the modern era, optimization algorithms for business and financial use cases continue to be ubiquitous.Partly as a result of this utility, algorithmic techniques for optimization problems have been well studied since even before the invention of the computer, including a famous dispute between Legendre and Gauss on who was responsible for the invention of least squares fitting [1].With the advent of the quantum era, there has been great interest in developing quantum algorithms that solve optimization problems with provable speedups over classical algorithms.Some of the earliest proposals rely on quantum annealing [2] or more recent work in variational algorithms [3,4] to solve combinatorial optimization problems.Quantum algorithms have also been developed that allow for more efficient convex optimization, including algorithms for semidefinite, second-order cone, and linear programs [5][6][7][8][9][10][11][12][13][14], as well as algorithms for solving systems of linear equations [15][16][17][18][19], which can be used for quantum data fitting [20].Using these techniques, specific financial use cases such as solving the portfolio optimization problem have been studied [21][22][23][24].
Unfortunately, it can be difficult to evaluate whether these quantum algorithms will be practically useful.In some cases, the algorithms are heuristic, and their performance can only be measured empirically once it is possible to run them on actual quantum hardware.In other cases, the difficulty in evaluating practicality stems from the inherent complexity of combining many distinct ingredients, each with their own caveats and bottlenecks.
To make an apples-to-apples comparison and quantify advantages of a quantum algorithm, a truly end-to-end resource analysis that accounts for all costs from problem input to problem output must be performed.
In this work, we perform such an end-to-end analysis for a quantum interior point method (QIPM) for solving second-order cone programs (SOCPs), which was originally proposed in Ref. [13], based on earlier QIPMs for semidefinite and linear programs [10].In particular, we focus on a concrete use case with very broad application, but of primary interest in the financial services sector: portfolio optimization (PO).In general, PO is the task of determining the optimal resource allocation to a collection of possible classes, so as to optimize a given objective.In finance, one seeks to determine the optimal allocation of funds across a set of possible assets that maximizes returns and minimizes risk, subject to constraints.Importantly, many variants of the PO problem can be cast as a SOCP and subsequently solved with a classical or quantum interior point method.Indeed, classical interior point methods (CIPMs) are efficient not only in theory, but also in practice; they are the method of choice within fast numerical solvers for SOCPs and other conic programs (e.g., [25]), which encompass a large variety of optimization problems that appear in industry.Notably, QIPMs structurally mirror CIPMs, and seek improvements by replacing certain subroutines with quantum primitives.Thus, compared to other proposed quantum algorithms for conic programs not based on widely used classical techniques (e.g., solvers that leverage the multiplicative weights update method [5][6][7][8]), QIPMs are uniquely positioned to provide not only a theoretical asymptotic advantage, but also a practical quantum solution for this common class of problem.
However, the QIPM is a complex algorithm that delicately combines some purely classical steps with multiple distinct quantum subroutines.The runtime of the QIPM is stated in terms of several parameters that can only be evaluated once a particular use case has been specified; depending on how these parameters scale, an asymptotic speedup may or may not be achievable.Additionally, any speedup is contingent on access to a large quantum random access memory (QRAM), an ingredient that in prior asymptotic-focused analyses has typically been assumed to exist without much further justification or cost analysis.
Our resource analysis is detailed and takes care to study all aspects of the end-to-end pipeline, including the QRAM component.We report our results in terms of relevant problem parameters, and then we perform numerical experiments to determine the size and scaling of these parameters for actual randomly chosen instances of the PO problem, based on historical stock data.This approach allows us to estimate the exact resource cost of the QIPM for an example PO problem, including a detailed breakdown of costs by various subroutines.This estimate incorporates several optimizations to the underlying subroutines, and technical improvements to how they are integrated into the QIPM.Consequently, our analysis allows us to evaluate the prospect that the algorithm could exhibit a practical quantum advantage, and it clearly reveals the computational bottlenecks within the algorithm that are most in need of further improvement.
While we focus on the QIPM and in particular on its application to the PO problem, our work has more general takeaways for quantum algorithms and for quantum computing applications.Firstly, our results emphasize the importance of end-to-end analysis when evaluating a proposed application.Furthermore, our modular treatment of the underlying algorithmic primitives produces quantitative and qualitative takeaways that would be relevant for end-to-end treatments of a large number of other algorithms that also rely on these subroutines, especially those in the area of machine learning, where data access via QRAM and quantum linear algebra techniques are often required [26].

B. Results
Our resource analysis focuses on three central quantities that determine the overall cost of algorithms implemented on fault-tolerant quantum computers: the number of logical qubits, the total number of T gates ("Tcount"), and the number of parallel layers of T gates ("Tdepth") needed to construct quantum circuits for solving the problem.The T -depth acts as a proxy for the overall runtime of the algorithm, whereas the T -count and number of logical qubits are important for determining how many physical qubits would be required for a full, faulttolerant implementation.We justify the focus on T gates by pointing out that, in many prominent approaches to fault-tolerant quantum computation (such as lattice surgery [27][28][29][30]), quantum circuits are decomposed into Clifford gates and T gates, and the cost of implementing the circuit is dominated by the number and depth of the T gates.The fault-tolerant Clifford gates can be performed transversally or even in software, whereas the T gates require the expensive process of magic state distillation [31,32].We stop short of a full analysis of the algorithm at the physical level, as we believe the logical analysis already suffices to evaluate the overall outlook for the algorithm and identify its main bottlenecks.
At the core of any interior point method (IPM) is the solving of a linear system of equations.The QIPM performs this step using a quantum linear system solver (QLSS) together with pure state quantum tomography.The cost of QLSS depends on a parameter κ F , the Frobenius condition number ∥G∥ F ∥G −1 ∥ of the matrix G that must be inverted (where ∥•∥ F denotes the Frobenius norm, and ∥•∥ denotes the spectral norm), while the cost of tomography depends on a parameter ξ, a precision parameter.We evaluate these parameters empirically by simulating the QIPM on small instances of the PO problem.
In table I, we report a summary of our overall resource calculation, in which we show the asymptotically leading term (along with its constant prefactor) in terms of parameters κ F and ξ, as well as n, the number of assets in the PO instance, and ϵ, the desired precision to which the portfolio should be optimized.We find (numerically) that κ F grows with n, and that ξ shrinks with n; we estimate that, at n = 100 and ϵ = 10 −7 , our implementation of the QIPM would require 8 × 10 6 qubits and 7 × 10 29 total T gates spread out over 2 × 10 24 layers.Needless to say, these resource counts are decidedly out of reach both in the near and far term for quantum hardware, even for a problem of modest size by classical standards.Even if quantum computers one day match the gigahertz-level clock-speeds of modern classical computers, 10 24 layers of T gates would take millions of years to execute.By contrast, the PO problem can be easily solved in a matter of seconds on a laptop for n = 100 stocks.
We caution that the numbers we report should not be interpreted as the final word on the cost of the QIPM for PO.We are certain that further examination of the algorithm could uncover many improvements and optimizations that would reduce the costs compared to our calculations.On the other hand, we note that our results do already incorporate several innovations we made to reduce the resource cost, including a basic attempt at preconditioning the linear system.Moreover, the pessimistic outlook our results convey is robust in the sense that the calculation would need to decrease by many orders of magnitude for the algorithm to be practical, suggesting that fundamental changes are necessary to multiple aspects of the algorithm, rather than merely superficial optimizations.
Besides the main resource calculation, we make several additional contributions and observations: 1. We provide explicit quantum circuits for the important subroutines of the QIPM, namely the state-ofthe-art QLSS based on the discrete adiabatic theorem [18] and pure state tomography, which complement the explicit circuits for block-encoding (using QRAM) that a subset of the authors already reported separately in Ref. [33].These circuits, and their precise resource calculations, could be useful elsewhere, as these subroutines are ubiquitous in quantum algorithms.See section IV F and section V for additional details.
2. We break down the resource calculation into its constituents to illustrate which parts of the algorithm are most costly.We find that many independent factors create significant challenges toward realizing quantum advantage with QIPMs, and our work underscores those aspects of the algorithm that must be improved for it to be useful.We also note that the conditions under which QIPMs would be most successful (namely, when κ F is small) also allow for classical IPMs based on iterative classical linear system solvers to be competitive.See section VII for additional details.
3. We numerically simulate several versions of the full QIPM solving the PO problem on portfolios as large as n = 120 stocks, and we report the empirical size and scaling of the relevant parameters κ F and ξ.There is considerable variability in the trends we observe, depending on which version of the QIPM is chosen, and when the QIPM is terminated, which makes it difficult to draw robust conclusions.However, we find that both κ F and ξ −1 appear to grow with n.Note that previous numerical experiments on a similar formulation of the PO problem [22] suggested κ F does not grow with problem size, but those authors scaled the number of "time epochs" while keeping n constant.Additionally, we observe that the "infeasible" version of the QIPM originally proposed by [13] empirically performs similarly to more sophisticated "feasible" versions [14], despite not enjoying the same theoretical guarantees of fast convergence.Finally, contrary to theoretical expectation, we observe that κ F and ξ −1 do not diverge as ϵ → 0 in our examples.See section VI for additional details.
4. We make various technical improvements to the underlying ingredients of QIPMs.A subset of authors previously reported [33] a quadratic improvement in the minimum depth required for the problem of preparing an arbitrary L-dimensional quantum state or block-encoding an arbitrary L × L matrix, along with explicit quantum circuits and exact resource expressions.In this manuscript, we additionally contribute the following: • Tomographic precision: Performing tomography on the output of a QLSS necessarily causes the classical estimate of the solution to the linear system to be inexact.We illustrate how the allowable amount of tomography precision can be determined adaptively rather than relying on theoretical bounds.Nonetheless, we also improve the constant prefactor in the tomographic bounds.The total number of state preparation queries needed to learn an unknown L-dimensional pure state to ξ error using the tomography method of Ref. [10,13] is to leading order at most 115L ln(L)/ξ2 . 1 • Norm of the linear system: Since QLSSs output a normalized quantum state, tomography does not directly yield the norm of the solution to the linear system.The norm can be learned through more complicated protocols, but we observe that in the context of QIPMs, a sufficient estimate for the norm can be learned classically.
• Preconditioning: We propose a simple preconditioning method that is compatible with the QIPM, while reducing the parameter κ F .Our numerical simulations suggest the reduction is TABLE I. Asymptotic, leading-order contributions to the total quantum resources for an end-to-end portfolio optimization (including constant factors), in terms of the number of assets in the portfolio (n), the desired precision to which the portfolio should be optimized (ϵ), the maximum Frobenius condition number of matrices encountered by the QIPM (κF ), and the minimum tomographic precision necessary for the algorithm to succeed (ξ).The T -depth and T -count expressions represent the cumulative cost of O(ξ −2 n 1.5 log(n) log(ϵ −1 )) individual quantum circuits performed serially, a quantity that we estimate evaluates to 6 × 10 12 circuits at n = 100; see table X for a detailed accounting.The right column uses a numerical simulation of the quantum algorithm (see section VI) to compute the instance-specific parameters in the resource expression and estimate the resource cost at n = 100 and ϵ = 10 −7 .

Resource
QIPM complexity Estimated at n = 100 Number of logical qubits more than an order of magnitude for the portfolio optimization problem.
• Feasible QIPM: We implement a "feasible" version of the QIPM proposed by [14], which relies on finding a basis for the null space of the SOCP matrix.We identified an explicit basis for the PO problem, thereby avoiding the need for a costly QR decomposition.However, we observe that finding the basis via QR decomposition leads to more stable numerical results.
The outline for the remainder of the paper is as follows.In section II we describe and define the portfolio optimization problem in terms of Markowitz portfolio theory.In section III we describe Second Order Cone Programming (SOCP) problems, illustrate how portfolio optimization can be represented as an instance of SOCP, and discuss how IPMs can be used for solving SOCPs.In section IV we review the quantum ingredients needed to turn an IPM into a QIPM.In particular, we review quantum linear system solvers, block-encoding for data loading, and quantum state tomography for data read out.We also present slightly better bounds on the required tomography procedure than were previously known.In section V we describe the full implementation of using QIPM and quantum algorithms for SOCP for the portfolio optimization problem, including a detailed resource estimate for the end-to-end problem.In section VI we show numerical results from simulations of the full problem, and in section VII we reflect on the calculation we have performed, identifying the main bottlenecks and drawing conclusions about the outlook for quantum advantage with QIPM.
The QIPM has many moving parts requiring several mathematical symbols.While all symbols are defined as they are introduced in the text, we also provide a full list of symbols for the reader's reference in appendix A. Throughout the paper, we denote all vectors in bold lowercase letters to contrast with scalar quantities (unbolded lowercase) and matrices (unbolded uppercase).The only exception to this rule will be the symbols N , K, and L, which are positive integers (despite being uppercase), and denote the number of rows or columns in certain matrices related to an SOCP instance.

A. Background
Portfolio optimization is the process widely used by financial analysts to assign allocations of capital across a set of assets within a portfolio, given optimization criteria such as maximizing the expected return and minimizing the financial risk.The creation of the mathematical framework for modern portfolio theory (MPT) is credited to Harry Markowitz [35,36], for which he received the 1990 Alfred Nobel Memorial Prize in Economic Sciences [37].Markowitz describes the process of selecting a portfolio in two stages, where the first stage starts with "observation and experience" and ends with "beliefs about the future performances of available securities."The second stage starts with "the relevant beliefs about future performances" and ends with "the choice of portfolio."The theory is also known as mean-variance analysis.For further history, Markowitz's 1999 essay [38] gives the early history of portfolio theory: 1600-1960.
Typically, portfolio optimization strategies include diversification, which is the practice of investing in a wide array of asset types and classes as a risk mitigation strategy.Some popular asset classes are stocks, bonds, real estate, commodities, and cash.After building a portfolio, we expect a return (or profit) after a specific period of time.Risk is defined as the fluctuations of the asset value.MPT describes how high variance assets can be combined with other uncorrelated assets through diversification to create portfolios with low variance on their return.Naturally, among equal-risk portfolios, investors prefer those with higher expected return, and among equal-return portfolios, they prefer those with lower risk.

B. Mathematical formulation
Within a portfolio, w i represents the amount of an asset i we are holding over some period of time.Often, this amount is given as the asset's price in dollars at the start of the period.When the price is positive (w i > 0), we call this a long position; and when the price is negative (w i < 0), we call this a short position with an obligation to buy this asset at the end of the period. 2he optimization variable in our portfolio optimization problem is the vector of n assets w ∈ R n in our portfolio.
The price of each asset i varies over time.We define u i to be the relative change (positive or negative) during the period of interest.Then, we define the return of the portfolio for that period as r = u ⊺ w dollars.The relative changes u ∈ R n follow a stochastic process, and we can model this with a random vector with mean û and covariance Σ.The return r is then a random variable with mean û⊺ w and covariance w ⊺ Σw.
To capture realistic problem formulations, we add one or more mathematical constraints to the optimization problem corresponding to the problem-specific considerations.For example, two common constraints in portfolio optimization problems are that we want no short positions (w i ≥ 0 for all i, denoted by w ≥ 0) and that the total investment budget is limited (1 ⊺ w = 1, where 1 denotes the vector of ones).This forms the classical portfolio optimization problem from Markowitz's meanvariance theory: This formulation is a quadratic optimization problem where we minimize the risk, while achieving a target return of at least rmin with a fixed budget and no short positions.In practice, the portfolio optimization problem is often reformulated in other ways, for example, to maximize return subject to a fixed amount of risk, or to optimize an objective function that weighs risk against return.In our application, we follow the latter approach, formulated as follows, where q is a tunable risk-aversion coefficient: This optimization problem is no longer a QO problem, but it can be mapped to a conic problem, as described later in section III B. Depending on the problem, additional constraints can be added. 3To illustrate the flexibility of this analysis, we include a maximum transaction constraint and use the following problem formulation in our analysis in the rest of the paper: where w denotes the current portfolio, so that |w − w| is the vector of transaction quantities for each asset, which are constrained to be smaller than maximum values contained in the vector ζ.Note that Ref. [22] chose a formulation more akin to eq. ( 1) for their numerical study of the quantum interior point method for portfolio optimization.For more information on the theory of convex optimization problems and algorithms for solving them, we direct the reader to Refs.[39,40].For more information about optimization methods in finance, we refer to Refs.[41][42][43].

III. SECOND ORDER CONE PROGRAMMING (SOCP) AND INTERIOR POINT METHODS (IPM)
A. Definitions Second-order cone programming (SOCP) is a type of convex optimization that allows for a richer set of constraints than linear programming (LP), without many of the complications of semidefinite programming (SDP).Indeed, SOCP is a subset of SDP, but SOCP admits interior point methods (IPMs) that are essentially just as efficient as IPMs for LP [44].Many real-world problems can be cast as SOCP, including the portfolio optimization problem we are interested in.
For any k-dimensional vector v, we may write v = (v 0 ; ṽ), where v 0 is the first entry of v, and ṽ contains the remaining k − 1 entries.
where ∥•∥ denotes the vector two-norm (standard Euclidean norm).For k = 1, In general, a second-order cone problem is formulated as where and A is a full-rank K × N matrix encoding K linear equality constraints, with K ≤ N .
Note that the special case of linear programming is immediately recovered if N i = 1 for all i.We say that a point x is primal feasible whenever Ax = b and x ∈ Q.It is strictly primal feasible if additionally it lies in the interior of Q.
The dual to problem in eq. ( 5) is a maximization problem over a variable y ∈ R K , given as follows: We say that a pair (s; y) is dual feasible whenever A ⊺ y + s = c and s ∈ Q.For any point (x; y; s) with x, s ∈ Q, we define the duality gap as where r is the number of cones, as in definition 2, and the second equality holds under the additional assumption that the point is primal and dual feasible.The fact that x, s ∈ Q implies that µ(x, s) ≥ 0.Moreover, assuming that both the primal and dual problems have a strictly feasible point, the optimal primal solution x * and the optimal dual solution (y * ; s * ) are guaranteed to exist and satisfy c ⊺ x * = b ⊺ y * , and hence µ [44].Thus, the primal-dual condition of optimality can be expressed by the system

B. Portfolio optimization as SOCP
The portfolio optimization problem can be solved by reduction to SOCP [43], and this reduction is often made in practice.Here we describe one way of translating the portfolio optimization problem, as given in eq. ( 3) into a second-order cone program.
The objective function in eq. ( 3) has a non-linear term q √ w ⊺ Σw, which we linearize by introducing a new scalar variable t, and a new constraint t ≥ √ w ⊺ Σw.We obtain the equivalent optimization problem min x=(w;t) Our goal now is to write the constraints in eq. ( 9) as second-order cone constraints.Given an m×n matrix M for which Σ = M ⊺ M , the constraint on t can be expressed by introducing an m-dimensional variable η subject to the equality constraint η = M w and the second-order cone constraint (t; η) ∈ Q m+1 .
The matrix M can be determined from Σ via a Cholesky decomposition, although for large matrices Σ, this computation may be costly.Alternatively, if Σ and μ are calculated from stock return vectors u (1) , . . ., u (m) during m independent time epochs (e.g.returns for each of m days or each of m months), then a valid matrix M ⊺ is given by (u (1) − û, . . ., u (m) − û), i.e. the columns of M ⊺ are given by the deviation of the returns from the mean in each epoch.This was the approach taken in Ref. [22], and is also the approach we take in our numerical experiments, presented later.The downside to this approach is that the number of time epochs must grow with the number of assets.We note that, in practice, computing the matrix Σ can be a research topic unto itself, which is beyond the scope of this paper [45].
The absolute value constraints are handled by introducing a pair of n-dimensional variables ϕ and ρ, subject to equality constraints ϕ = ζ − (w − w) and ρ = ζ + (w − w).The absolute value constraints are then imposed as positivity constraints ϕ i ≥ 0, ρ i ≥ 0, which we include as second-order cone constraints of dimension 1. 4In summary, we may write the portfolio optimization problem from eq. (3) as the following SOCP that mini-mizes over the variable x = (w; ϕ; ρ; t; η) ∈ R 3n+m+1 : min x [−û; 0; 0; q; 0] ⊺ (w; ϕ; ρ; t where I denotes an identity block, 0 denotes a submatrix of all 0s, 0 is a vector of all 0s, 1 is a vector of all 1s, and the size of each block of A can be inferred from its location in the matrix.Thus, the total number of cones is r = 3n + 1, and the combined dimension is N = 3n + m + 1. 5 The SOCP constraint matrix A is a K × N matrix, with K = 2n + m + 1.This SOCP is very similar to that considered by Kerenidis, Prakash, and Szilágyi [22]; however, rather than optimize a weighted combination of risk and return, they optimized risk subject to a fixed value for return, and they did not include the budget constraints.
Notice that many of the rows of the K × N matrix A are sparse and contain only one or two nonzero entries.However, the final m rows of the matrix A will be dense and contain n + 1 nonzero entries due to the appearance of the matrix M containing historical stock data; in total a constant fraction of the matrix entries will be nonzero, so sparse matrix techniques will provide only limited benefit.
Finally, we can observe that the primal SOCP in eq. ( 10) has an interior feasible point as long as ζ has strictly positive entries.To see this, choose w to be any strictly positive vector that satisfies |w − w| < ζ, and let ϕ = ζ + ( w − w), ρ = ζ − ( w − w), η = M w, and t equal to any number strictly greater than ∥η∥.It can be verified that the dual program likewise has a strictly feasible point; this guarantees that the optimal primal-dual pair for the SOCP exists and satisfies eq. ( 8).

Introduction
Interior point methods (IPMs) are a class of efficient algorithms for solving convex optimization problems including LPs, SOCPs, and SDPs, where (in contrast to the simplex method) intermediate points generated by the method lie in the interior of the convex set, and they are guaranteed to approach the optimal point after a polynomial number of iterations of the method.Each iteration involves forming a linear system of equations that depends on the current intermediate point.The solution to this linear system determines the search direction, and the next intermediate point is formed by taking a small step in that direction.We will consider path-following primal-dual IPMs, where, if the step size is sufficiently small, the intermediate points are guaranteed to approximately follow the central path, which ends at the optimal point for the convex optimization problem.

Central path
To define the central path, we first establish some notation related to the algebraic properties of the secondorder cone.Following formulations in prior literature [13,44], we let the product u • v of two vectors u = (u 0 ; ũ), v = (v 0 ; ṽ) ∈ Q k be defined as and we denote the identity element for this product by the vector e = (1; 0) ∈ Q k .For the Cartesian product Q = Q N1 × . . .× Q Nr of multiple second-order cones, the vector e is defined as the concatenation of the identity element for each cone, and the circle product of two vectors is given by the concatenation of the circle product of each constituent.A consequence of this definition is the that e ⊺ e is equal to the number of cones r.Now, for the SOCP problem of eq. ( 5), the central path (x(ν); y(ν); s(ν)) is the one-dimensional set of central points, parameterized by ν ∈ [0, ∞), which satisfies the conditions We can immediately see that the central path point (x(ν); y(ν); s(ν)) has a duality gap that satisfies µ(x(ν), s(ν)) = ν, and that when ν = 0, eq. ( 12) recovers eq. ( 8).

Finding an initial point on the central path via self-dual embedding
Path-following primal-dual interior point methods find the optimal point by beginning at a central point with ν > 0 and following the central path to a very small value of ν, which is taken to be a good approximation of the optimal point.For a given SOCP, finding an initial point on the central path is non-trivial and, in general, can be just as hard as solving the SOCP itself.One solution to this problem is the homogeneous self-dual embedding [46,47], where one forms a slightly larger self-dual SOCP with the properties that (i ) the optimal point for the original SOCP can be determined from the optimal point for the self-dual SOCP and (ii ) the self-dual SOCP has a trivial central point that can be used to initialize the IPM.
To do this, we introduce new scalar variables τ , θ, and κ, which are used to give more flexibility to the constraints.Previously, we required Ax = b.In the larger program, we relax this constraint to read Ax = bτ − (b − Ae)θ, such that the original constraint is recovered when τ = 1 and θ = 0, but x = e is a trivial solution when τ = 1 and θ = 1.Similarly, we relax the constraint A ⊺ y + s = c to read A ⊺ y + s = cτ − (c − e)θ, which has the trivial solution y = 0, s = e when τ = θ = 1.We complement these with two additional linear constraints to form the program min (x;y;τ ;θ;s;κ) where b = b − Ae, c = c − e, z = c ⊺ e + 1, and r = e ⊺ e is the number of cones in the original SOCP.While eq. ( 13) is not exactly of the form given in eq. ( 5), we may still think of it as a primal SOCP.Since the block matrix in eq. ( 13) is skew-symmetric and the objective function coefficients are equal to the right-hand-side of the equality constraints, when we compute the dual program (c.f.eq. ( 6)), we arrive at an equivalent program; we conclude that eq. ( 13) is self-dual [46].Thus, when applying path-following primal-dual IPMs to eq. ( 13), we need only keep track of the primal variables, that is, x, y, τ, θ, s, κ.Taking into account the addition of τ and κ, which are effectively an extra pair of primal-dual variables, we redefine the duality gap (c.f.eq. ( 7)) as Note that if the point (x; y; τ ; θ; s; κ) is feasible, i.e. if it satisfies the four linear constraints in eq. ( 13), then we have the identity where the first, second, third, and fourth rows of eq. ( 13) are invoked above in lines one, two, three, and four, respectively.This equality justifies the redefinition in eq. ( 14): noting that the primal objective function in eq. ( 13) is (r + 1)θ, and (since the program is self-dual) the associated dual objective function is −(r + 1)θ, we see that the gap between primal and dual objective functions, divided by the number of conic constraints (2r+2), is exactly equal to θ.
The central path for the augmented SOCP in eq. ( 13) is defined by the feasibility conditions for the SOCP combined with the relaxed complementarity conditions x • s = νe and κτ = ν.Thus, we see that the point (x = e; y = 0; τ = 1; θ = 1; s = e; κ = 1) is not only a feasible point for the SOCP in eq. ( 13), but also a central point with ν = 1.
What if we only have a point that is approximately optimal for the self-dual SOCP?We can still deduce an approximately optimal point for the original SOCP.Suppose we have a feasible point for which µ(x, τ, s, κ) = ϵ.The point (x/τ ; y/τ ; s/τ ) is O(ϵ) close to feasible for the original SOCP in the sense that the equality constraints are satisfied up to O(ϵ) error Moreover, since κ > 0 and θ = ϵ, we can assert using the third row of eq. ( 13) that the difference in objective function achieved by the primal and dual solutions is also In summary, by using the self-dual SOCP of eq. ( 13), we obtain a trivial point from which to start the IPM, and given an (approximately) optimal point we obtain either an (approximately) optimal point to the original SOCP or a certificate that the original SOCP was not feasible to begin with.

Iterating the IPM
Each iteration of the IPM takes as input an intermediate point (x; y; τ ; θ; s; κ) that is feasible (or in some formulations, nearly feasible), has duality gap 1 r+1 (x ⊺ s+ κτ ) equal to µ, and is close to the central path with parameter ν = µ.The output of the iteration is a new intermediate point (x + ∆x; y + ∆y; τ + ∆τ ; θ + ∆θ; s + ∆s, κ + ∆κ) that is also feasible and close to the central path, with a reduced value of the duality gap.Thus, many iterations leads to a solution with duality gap arbitrarily close to zero.
One additional input is the step size, governed by a parameter σ < 1.The IPM iteration aims to bring the next intermediate point onto the central path with parameter ν = σµ.This is accomplished by taking one step using Newton's method, where the vector (∆x; ∆y; ∆τ ; ∆θ; ∆s; ∆κ) is uniquely determined by solving a linear system of equations called the Newton system.The first part of the Newton system is the conditions that must be met for the new point to be feasible, given in the following system of N + K + 2 linear equations: Note that if the point is already feasible, the right-handside is equal to zero.
The second part of the Newton system is the linearized conditions for arriving at the point on the central path with duality gap σµ.That is, we aim for (x + ∆x) • (s + ∆s) = σµe and (κ + ∆κ)(τ + ∆τ ) = σµ.By ignoring second order terms (i.e. the O(∆x • ∆s) and O(∆κ∆τ ) terms), these become The expression above can be rewritten as a matrix equation by first defining the arrowhead matrix U for a vector u = (u 0 When u ∈ Q lies in the direct product of multiple secondorder cones, the arrowhead matrix is formed by placing the appropriate matrices of the above form on the block diagonal.The arrowhead matrix has the property that for any vector v, Using this notation, the Newton equations in eq. ( 20) can be written as where X and S are the arrowhead matrices for vectors x and s.Equations ( 19) and ( 22) together form the Newton system.We can see that there are 2N + K + 3 constraints to match the 2N + K + 3 variables in the vector (∆x; ∆y; ∆τ ; ∆θ; ∆s; ∆κ).In Ref. [48], it is shown that, as long as the duality gap is positive and (x; y; τ ; θ; s; κ) is not too far from the central path (which will be the case as long as σ is chosen sufficiently close to 1 in every iteration), the Newton system has a single unique solution.Note that one can choose different search directions than the one that arises from solving the Newton system presented here; this consists of first applying a scaling transformation to the product of second-order cones, then forming and solving the Newton system that results, and finally applying the inverse scaling transformation.Alternate search directions are explained in appendix D, but in the main text we stick to the basic search direction illustrated above, since in our numerical simulations the simple search direction gave equal or better results than more complex alternatives, and it enjoys the same theoretical guarantee of convergence [48].

Solving the Newton system
The Newton system formed by combining eqs.( 19) and ( 22) is an L × L linear system of the form Gu = h, where L = 2N + K + 3. Classically this can be solved exactly a number of ways, the most straightforward being Gaussian elimination, which scales as O(L 3 ).Using Strassen-like tricks [49], this can be asymptotically accelerated to O(L ω ) where ω < 2.38 [50], although practically the runtime is closer to O(L 3 ).Meanwhile, the linear system can be approximately solved using a variety of iterative solvers, such as conjugate gradient descent or the randomized Kaczmarz method [51].The complexity of these approaches depends on the condition number of the Newton matrix.Section IV discusses quantum approaches to solving the Newton system.
It is important to distinguish methods that exactly solve the Newton system, and methods that solve it inexactly, because inexact solutions typically lead to infeasible intermediate points.As presented above, the Newton system in eqs.( 19) and ( 22) can tolerate infeasible intermediate points; the main consequence is that the righthand-side of eq. ( 19) becomes non-zero.This inexact formulation was the one pursued by Kerenidis, Prakash, and Szilágyi [13], who first examined QIPMs for SOCP (although they did not implement the self-dual embedding as we have done).However, it was pointed out in Refs.[11,14] that the theoretical convergence analysis that Ref. [13] relies upon requires intermediate points to be exactly feasible (i.e. the right-hand-side of eq. ( 19) is always zero), and that analyses allowing for infeasibility generally have poorer guaranteed convergence time (although in practice they can be just as fast [40]).As discussed in section IV, exact feasibility is difficult to maintain in quantum IPMs, since the Newton system cannot be solved exactly.
Ref. [14] proposed a workaround by which exact feasibility can be maintained despite an inexact linear system solver, which they call an inexact-feasible IPM (IF-IPM).For the IF-IPM, we assume we have access to a basis for the null space of the feasibility constraint equations, that is, a linearly independent set of solutions to eq. ( 19) when the right-hand-side is zero.We arrange these basis vectors as the columns of a matrix B; since there are N + K + 2 linear feasibility constraints and 2N + K + 3 variables, the matrix B should have N + 1 columns.In the case of portfolio optimization, a matrix B satisfying this criteria can be deduced by inspection, as discussed in appendix C; however, this choice does not yield a B with orthogonal columns.Generating a B with orthonormal columns can be done by performing a QR decomposition of the matrix in eq. ( 19), which would incur a large onetime classical cost of O((N + K) 3 ) operations 6 .In either case, since B is a basis for the null space of the constraint equations, there is a one-to-one correspondence between vectors ∆z ∈ R N +1 , and vectors that satisfy eq. ( 19) via the relation (∆x; ∆y; ∆τ ; ∆θ; ∆s; ∆κ) = B∆z.Thus, our Newton system can be reduced to (∆x; ∆y; ∆τ ; ∆θ; ∆s; ∆κ) = B∆z.
The Newton system above can be solved by first computing ∆z by inverting the quantity in brackets in the first line and applying it to the right-hand-side, and then computing (∆x; ∆y; ∆τ ; ∆θ; ∆s; ∆κ) by performing the multiplication B∆z.This matrix-vector product can be accomplished classically in O(N 2 ) operations.Note that matrix-matrix products where one of the matrices is an arrowhead matrix (S or X) can also be carried out in O(N 2 ) classical time, as the form of arrowhead matrices given in eq. ( 21) implies that the product can be computed by summing several matrix-vector products.Finally, note that since the second and fourth block columns of the first matrix in eq. ( 22) are zero, the second and fourth block rows of B (e.g. in eq.(C1)) can be completely omitted from the calculation.Thus, we see three main choices for how to run the IPM when the solution to linear systems is inexact: first, by solving eqs.( 19) and ( 22) directly and allowing intermediate solutions to be infeasible; second, by finding a matrix B by inspection as described in appendix C and then solving eqs.( 23) and ( 24); third, by finding a matrix B via QR decomposition and then solving eqs.( 23) and (24).When the linear system is solved using a quantum algorithm, as discussed in section IV, we refer to the algorithm that results from each of these three options by II-QIPM, IF-QIPM, and IF-QIPM-QR, respectively.The pros and cons of each method are summarized in table II.

Neighborhood of the central path and polynomial convergence
Prior literature establishes that if sufficiently small steps are taken (i.e., if σ is sufficiently close to 1), then each intermediate point stays within a small neighborhood of the central path.We now review these conclusions.Following Ref. [48], for a vector u = (u 0 ; ũ) ∈ Q k , we define the matrix which, as for the arrowhead matrix, generalizes to the product of multiple cones by forming a block diagonal of matrices of the above form.We use the distance metric defined in Ref. [48] The distance metric induces a neighborhood N , which includes both feasible and infeasible points, as well as the neighborhood N F , which includes only feasible points where P F denotes the set of feasible points for the selfdual SOCP.Note that the vector T x s can be computed classically in O(N ) time given access to the entries of x and s.Thus, whether or not a point lies in N (γ) can be determined in O(N ) time.Corollary 1 of Ref. [48] then implies that, so long as 0 ≤ γ ≤ 1/3 and (x; y; τ ; θ; s; κ) ∈ N F (γ), then we have TABLE II.Choices on which version of the Newton system to solve lead to different versions of the QIPM, even with the same underlying quantum subroutines.

Caveats
Theoretical convergence guarantee requires O(r 2 ) (rather than O( √ r)) iterations Ill-conditioned null-space basis leads to large condition number of Newton system Requires classical QR decomposition, which could dominate overall runtime Thus, if Γ ≤ γ, and assuming the Newton system is solved exactly, every intermediate point will lie in N F (γ).This condition is met, for example, if γ = 1/10 and σ = 1 − (20 √ 2 (r + 1)) −1 .Since each iteration reduces the duality gap by a factor σ, the duality gap can be reduced to ϵ after roughly only 20 2(r + 1) ln(1/ϵ) iterations.If the Newton system is solved inexactly, but such that feasibility is preserved (e.g., by solving inexactly for ∆z and then multiplying by B, as described above), then an error δ on the vector (x; τ ; s; κ) can be tolerated, and the resulting vector can still be within the neighborhood at each iteration.
On the other hand, if the Newton system is not solved exactly, then the resulting vector may not be feasible.Since N F (γ) is defined as a subset of the feasible space, the analysis of Ref. [48] breaks down (as pointed out in Refs.[11,14]).Thus, the II-QIPM version of the QIPM does not enjoy the theoretical guarantee of convergence in O( √ r) iterations that the IF-QIPM and IF-QIPM-QR versions do (see table II).The best guarantees for the II-QIPM would imply convergence only after O(r 2 ) iterations [11,14].Nevertheless, it is unclear if a small amount of infeasibility makes a substantial difference in practice: we simulated multiple version of the QIPM and observed similar overall performance when intermediate solutions were allowed to be infeasible, despite an inferior theoretical guarantee of success.Thus, in sections V and VI, where we present the full QIPM implementation, resource count, and numerical analysis, we focus on the II-QIPM.We present some of the results of our numerical simulations of the IF-QIPM and IF-QIPM-QR results in the appendix.

IV. QUANTUM INTERIOR POINT METHODS (QIPM)
A. Basic idea of QIPM As discussed in section III, each iteration of an IPM SOCP solver involves forming and solving a linear system of equations that depends on the intermediate point at the current iteration.For classical IPM implementations for SOCP, the linear systems of equations are typically solved exactly; for example the numerical SOCP solving package ECOS solves linear systems with a sparse LDL (Cholesky) factorization [25].For arbitrary dense systems, the runtime of solving an L × L system this way is O(L 3 ) [53], but by exploiting sparsity the actual runtime in practice could be much faster, by an amount that is hard to assess.Alternatively, it would, in principle, be possible to employ classical iterative approximate linear system solvers such as conjugate gradient descent or the randomized Kaczmarz method.The choice of the linear system solver thereby determines the overall complexity of the IPM SOCP solver.The idea of QIPM, as pioneered in Refs.[10,11], is to use a quantum subroutine to solve the linear system of equations [15].Notably, all other steps of IPMs stay classical and remain the same as described in section III.As a quantum linear system solver (QLSS) does not solve the exact same mathematical problem as classical linear system solvers and, moreover, a QLSS needs coherent (quantum) access to the classical data as given by the entries of the relevant matrices, there are various additional tools we will discuss that allow us to embed QLSS subroutines as a step of IPM SOCP solvers.First, we discuss in section IV B the input and output model of QLSSs and present the complexity of state-ofthe-art QLSSs.Then, in section IV C, we give constructions based on quantum random access memory (QRAM) to load classical data as input into a QLSS and discuss the complexity overhead arising from that step.Subsequently, in section IV D, we present so-called pure state quantum tomography that allows to convert the output of the QLSS into an estimate of the classical solution vector of the linear system of equations.Finally, in section IV E, we put all the steps together and state the overall classical and quantum complexities of using QLSSs as a subroutine in IPM SOCP solvers.As described in previous work [22], the ultimate idea is to compare these costs to the complexities of classical IPM SOCP solvers and point out regimes where quantum methods can potentially scale better that any purely classical methods (e.g., in terms of the SOCP size N , the matrix condition number κ, etc.) We note that the content of this section largely corresponds to collecting various state-of-the-art results from prior literature.These ingredients are used together with the conceptual framework of [10,11,14,22] to lift the QIPMs presented there to superior efficiency.In section V, we present a few novel enhancements to the implementation of the QIPM and fully explicit, end-toend quantum circuits with corresponding novel finite size complexities.

B. Quantum linear system solvers
For our purposes, a linear system of equations is given by a real invertible L × L matrix G together with a real vector h = (h 1 , . . ., h L ), and one is looking to give an estimate of the unknown solution vector u = (u 1 , . . ., u L ) defined by Gu = h.We define the (Frobenius) condition number where ∥ • ∥ F denotes the Frobenius norm and ∥ • ∥ for a matrix argument denotes the spectral norm.For this setting, the input to a QLSS is then comprised of: (i) a preparation unitary U h that creates the ℓ := ⌈log L⌉ qubit quantum state where ∥ • ∥ for a vector argument denotes the vector twonorm (standard Euclidean norm), (ii) a block encoding unitary U G in the form on ℓ + ℓ G qubits for some ℓ G ∈ N, and (iii) an approximation parameter ε QLSP ∈ (0, 1].The quantum linear system problem (QLSP) is stated as follows: For a triple (G, h, ε QLSP ) as above, the goal is to create an ℓ-qubit quantum state |ṽ⟩ such that defined by Gu = h with u = (u 1 , . . ., u L ), by employing as few times as possible the unitary operators U G , U h , their inverses U † G , U † h , controlled versions of U G , U h , and additional quantum gates on potentially additional ancilla qubits.The QLSP together with the first QLSS was introduced in [15] and then gradually improved in [16,17,19,54,55].The state-of-the-art QLSS [18] using the fewest calls to U G , U h and their variants, is based on ideas from discrete adiabatic evolution [56].We note the following explicit complexities from [18,Theorem 9], adapted to our setting.Proposition 1.The QLSP for (G, h, ε 1 ) can be solved with a quantum algorithm on ⌈log 2 (L)⌉ + 4 qubits for for some constant C ≤ 44864 using Q ≥ κ F (G) controlled queries to each of U G and U † G , and 2Q queries to each of U h and U † h , and constant quantum gate overhead.
We note that a stronger version of above proposition works with the (regular) condition number κ(G) := ∥G∥∥G −1 ∥, but it requires a block-encoding of the form eq. ( 33) in which the normalization factor is ∥G∥ rather than ∥G∥ F .For general matrices of classical data, we do not know of a method to produce such a block-encoding.In our case, we work with the Frobenius version κ F (G), since we do have a straightforward method to perform U G with normalization factor ∥G∥ F , described in section IV C. It is then sufficient to give upper bounds for the remaining κ F (G) to run the algorithm from proposition 1.In practice, we will give such upper bounds by using appropriate heuristics (cf.section V on implementations).
Note that proposition 1 implies a solution to the QLSP in eq. ( 34) with an asymptotic query complexity of O(κ F /ε QLSP ) to U G , U h and their variants and under standard complexity-theoretic assumptions this is optimal in terms of the scaling O(κ) [15], but not in terms of the scaling O(ε QLSP ).To get to an improved O(log(1/ε QLSP )) scaling, the authors of [18] further rely on the eigenstate filtering method of [55,Sec. 3] that additionally invokes a quantum singular value transform based on a minimax polynomial.We note the following overall complexities from [18,Theorem 11], adapted to our setting.Proposition 2. The QLSP problem for (G, h, ε 2 ) can be solved with a quantum algorithm on ⌈log 2 (L)⌉ + 5 qubits that produces a quantum state with ⟨0 5 |⊥⟩ = 0 and success probability p ≥ 1/2.With that, the sought-after ε 2 -approximate solution quantum state |ṽ⟩ can be prepared using Q + d controlled queries to each of U G and U † G , and 2Q + 2d queries to each of Here, C ≤ 44864 is the same constant as in proposition 1.
This version of the algorithm essentially uses proposition 1 with a constant choice of ε 1 ≤ 2 − √ 2, which ensures that the state prepared has overlap at least 1/ √ 2 with the ideal state |v⟩.Then, it uses eigenstate filtering to measure whether the final state is the correct solution state.On average we need to repeat the algorithm no more than twice to produce the desired state |ṽ⟩.The resulting scaling that proposition 2 implies for the QLSP problem in eq. ( 34) is O(κ log(1/ε QLSP )).Following the findings from [18, Sec.V], we note that in practice the Q ≈ 1.31Cκ F (G) dominates over d and all other terms can be safely neglected for typical settings -even for finite scale analyses.Moreover, the constant C is typically an order of magnitude smaller than the estimates given [18, Sec.IV.E]; numerical estimates produced a smaller value of 2305.No direct estimates for general matrices G are available from [18], but we will henceforth assume C = 2305 for our numerical estimates.Additionally, note that for the eigenstate filtering step via QSVT, the minimax polynomial from [55, Sec.3] and its corresponding quantum signal processing angles have to be computed.This is done as part of classical pre-processing [57, Sec.III]. 8ote that the implementation of the QLSS in each of proposition 1 and proposition 2 assume perfect implementation of the underlying circuits, without additional gate synthesis errors.In practice, however, these circuits will not be implemented perfectly, and hence we will later include additional sources of error (e.g., block-encoding error, imperfect rotation gates, etc.) that also contribute to ε QLSP .We include these additional contributions in section IV D, for example.
In the following, we continue by laying out the additional classical and quantum resources needed to employ QLSS for estimating in an end-to-end fashion the classical solution vector v = (v 1 , . . ., v L ) instead of the quantum state |v⟩.
In many quantum algorithms (and in particular for our use case), one needs coherent access to classical data for use in the algorithm.Block-encodings of matrices provide a commonly used access model for the classical data by encoding matrices into unitary operators, thereby providing oracular access to the data.As mentioned above, for a matrix G ∈ R L×L , a unitary matrix U G block-encodes G when the top-left block of U G is proportional to G, i.e.
where α ≥ ∥G∥ is a normalization constant, chosen as α = ∥G∥ F for our use case.The other blocks in U G are irrelevant, but they must be encoded such that U G is unitary.For our purposes, we focus on real matrices G, but the extension to complex matrices is straightforward.
A block-encoding makes use of unitaries that implement (controlled) state preparation, as well as quantum random access memory (QRAM) data structures for loading the classical data.Specifically, we refer to QRAM as the quantum circuit that allows query access to classical data in superposition where j is the address in superposition with amplitude ψ j and |a j ⟩ is the classical data loaded into a quantum state.There are several models of QRAM one can use that differ in the way in which the data is loaded.The two most notable QRAM models are the select-swap (SS) model, which is particularly efficient in terms of T -gate utilization [60], and the bucket-brigade (BB) model [61], which has reduced susceptibility to errors when operated on potentially faulty hardware [62].
The block-encoding unitary U G acts on ℓ + ℓ G qubits, where ℓ = ⌈log 2 (L)⌉ and, in our construction, ℓ G = ℓ.To build it, we follow the prescription of [59,63,64], in which one forms U G as the product of a pair of controlled-state preparation unitaries U L and U R .Specifically, where the ℓ-qubit states |ψ j ⟩ and |ϕ k ⟩ are determined from the matrix elements G jk of G, as follows: where G j,• denotes the jth row of G.That is, controlled on the second ℓ-qubit register in the state |j⟩, U R prepares the ℓ-qubit state |ψ j ⟩ into the first ℓ-qubit register, and U L performs the same operation for the states |ϕ k ⟩ modulo a swap of the two registers.Both U L and U R utilize an additional ℓ ′ QRAM ancilla qubits that begin and end in the state |0⟩.These controlled-state preparation unitaries U R and U L are implemented by combining a QRAM-like data-loading step with a protocol for state preparation of ℓ-qubit states.There are several combinations of state preparation procedure and QRAM model one can choose with varying benefits and resource requirements.In [33], a subset of the authors of the present work studied the resources required to implement these block-encodings and provided explicit circuits for their implementation.For our immediate purposes, we will simply import the relevant resource estimates from that work in table III, and we refer the interested reader to [33] for further details. 9For our purposes, we will work with the minimum depth circuits that achieve a T -gate depth of O(log L), at the price of using a total number of O(L 2 ) many qubits for the data structure implementing the block encoding unitary U G .Finally, the ℓ-qubit unitary U h defined by |h⟩ = U h |0⟩ ⊗ℓ corresponds to the special case of quantum state preparation and is directly treated by the methods outlined in [33, Sec.III.C].The resources required to synthesize U h up to error ε h are also reported in table III.
The minimum-depth block encodings of [33] also incur some classical costs.Specifically, the quoted depth values are only achievable assuming a number of angles have been classically pre-computed and for each angle a gate sequence of single-qubit Clifford and T gates that synthesizes a single-qubit rotation by that angle up to small error.Calculating one of the angles can be done by summing a subset of the entries of G and computing an arcsin.Meanwhile, circuit synthesis requires applying a version of the Solovay-Kitaev algorithm [66,67].For the block-encoding procedure, L(L − 1) angles and their corresponding gate sequences must be computed, which requires a total runtime of L 2 polylog(1/ε G ) [67], although this computation is amenable to parallelization.For the state preparation procedure, L − 1 angles and their sequences are needed.

D. Quantum state tomography
We have described how we can produce a quantum state |ṽ⟩ approximating the (real-valued) solution |v⟩ of a linear system up to precision ε QLSP .As mentioned in section IV B, in the actual circuit implementation, the approximation error ε QLSP accounts for both the inherent error from eigenstate filtering captured in proposition 2 as well as additional gate synthesis error arising from imperfect implementation of block-encoding unitaries and single-qubit rotations.The next step is to approximately read out the amplitudes of |ṽ⟩ into classical form.To start out, we will prove the following proposition, which tells us how many copies of a quantum state are needed to provide a good enough classical description of it, up to a phase on each amplitude.This proposition and its proof are adapted from [34, Proposition 13], with somewhat sharpened constant factors.We give the proof in appendix B 1. Recall that proposition 2 gives a unitary U such that with |ṽ⟩ = N i=1 ṽi |i⟩, ⟨0 5 |⊥⟩ = 0, and p ≥ 1/2.The vector ṽ may have complex coefficients, but it approximates a real vector v up to some error ε QLSP in ℓ 2 norm.Our goal is to obtain an estimate ṽ′ = (v ′ 1 , . . ., v ′ N ) such that ∥v − ṽ′ ∥ ≤ ξ for an error parameter ξ ∈ [0, 1].(47) where ξ captures all sources of error.Proposition 3 is not quite sufficient because it only gives us an estimate of the absolute value of ṽ.However, the following procedure, adapted from [10, Sec.4], will be sufficient: many copies of the quantum state and measure them all in the computational basis to give empirical estimates {p i } L i=1 of the probabilities p|ṽ which by applying a Hadamard can be mapped to Here |⊥ ′ ⟩ is an arbitrary state orthogonal to |0 p ′ i |i⟩ can only be prepared up to some error.Next, measure this state in the computational basis, denoting the measurement count of the result 0 6 i as k + i and the result 0 5 1i as k − i .
We give the proof in appendix B 1. The statement is used to bound the total error parameter ξ by the quantity ε + 1.58 √ Lε tsp + 1.58ε QLSP .We note that a similar procedure in [10,Sec. 4] has already been proven to work, with somewhat worse success probability guarantees and worse constants.Ref. [34,Proposition 16] shows a similar result for complex-valued states, but we use a sharper proof for input states close to real-valued.proposition 4, together with proposition 2, produces with high probability an O(ε) good estimate ṽ′ of v by using O(L ln(L)/ε 2 ) many samples. 10There are other methods in the literature that allow to perform pure state quantum tomography with comparable query complexities (e.g.[68]), but 10 If our goal is to resolve the initial linear system Gu = h, then the vector ṽ′ produced as in Section IV D as an estimate for the normalized vector v = u/∥u∥, gives an estimate for u via ũ := ṽ′ • ∥h∥ ∥Gṽ ′ ∥ , for which we find Notice that as a worst case guarantee, this picks up an additional factor κ(G) in error scaling.However, for our purposes it will be sufficient to directly work with the normalized estimate ṽ′ for v, the reason being that only the direction of the solution vector is important to us and not its exact normalization.
we favor the above method because of its computational simplicity, and the fact that it does not require us to solve any potentially costly additional optimization problems.
Very recently, the sample complexity has been improved to O(L ln(L)/ε), which comes at the cost of more complicated quantum circuits and higher constant overheads [34,Theorem 23].It would be interesting to work out the more involved finite complexity of this result, and we further comment on the potential impact of this in section VII.

E. Asymptotic quantum complexity
Putting everything together, the steps of our QLSS for given real L × L matrix G and real vector h of size L are: 1. Construct the circuits that implement the blockencoding unitaries U G and U h up to error ε G and ε h via quantum state preparation and QRAM, which involves a classical pre-processing cost scaling as 2. Employ the QLSS unitary from proposition 2 to approximately solve the corresponding QLSP, leading to the quantum state |ṽ⟩.The query complexity to U G , U h , their controlled versions, and their inverses, is O(κ F (G) log(1/ε)).The number of qubits needed is ⌈log L⌉ + 5.
3. Repeat the previous step O(L ln(L/δ)ε −2 ) many times to implement the pure state quantum tomography scheme from section IV D, which also requires the use of an O(L) qubit QRAM structure, and one ancilla qubit.Tomography leads to the sought-after classical vector estimate ṽ′ with ∥ṽ ′ − v∥ ≤ ε.
The QLSS can then be used for each iteration of an IPM SOCP solver, which involves forming and solving a linear system of equations, resulting in the QIPM SOCP solver.We provide the quantum circuits needed to implement the solver in the section IV F. However, we emphasize that we have not yet considered the various practical aspects and difficulties of setting up an end-to-end QIPM SOCP solver, which is discussed further in section V.

F. Quantum circuits
The following are the quantum circuits needed for the QLSS of proposition 1.The QLSS requires applying a unitary U [s] for many different values of s, where U [s] is a block-encoding of a certain Hamiltonian related to G and h, as specified below.The unitary acts on 4+ℓ+ℓ G total qubits, where the final ℓ G qubits are ancillas associated with U G .The four single-qubit registers are referred to with labels a 1 , a 2 , a 3 , a 4 , the ℓ-qubit register with label L, and the ℓ G -qubit register with label ℓ G .These labels are used as subscripts on bras, kets, and operators to clarify the register to which they apply.The circuit for U [s] is depicted in fig. 1, and is described in [18, appendix E].Specifically, the unitary U [s] is a block-encoding of the (2+ℓ)-qubit Hamiltonian c(s)•H[s] := (1−f (s))H 0 + f (s)H 1 on registers a 4 a 1 L, where c(s) is a normalization factor (defined later in eq. ( 60)), and and where I L denotes the identity operation on subsystem L, and the four rows and columns correspond to the sectors with qubits a 4 , a 1 set to (0, 0), (0, 1), (1, 0), (1, 1). Figure 1 features the expressions where H denotes the single-qubit Hadamard gate, and R(s) is given by The normalization factor of R(s) above combines with a factor of 1/ √ 2 introduced by the Hadamard gate to give an overall normalization factor for H(s) of and scheduling function f (s) with f (0) = 0 and f (1) = 1.Note that we have the self-inverse property The overall quantum circuit U for the quantum algorithm of proposition 1 is then given as (cf.[69])  53) to ( 59) except for sub-circuit UQ h , which is depicted in fig. 4. The unitary U [s] is then used in eq. ( 61) to define the overall quantum circuit U for proposition 1.
Quantum singular value transform (QSVT) circuit, described in Ref. [59], acting on the block-encoding U [1] of H(1) = H1/ √ 2, as defined in eq. ( 54).The circuit features one additional ancilla qubit and depends on the classically precomputed rotation angles {ϕ1, 3. Controlled version of the quantum circuit in fig. 1, controlled on qubit c.Note that not all gates need to be controlled on c, as their inverses follows in the circuit.
with the walk operator where W is the operator that acts as identity on registers a 4 a 1 L (which host the Hamiltonian H[s]) while performing the reflection (2 |0⟩ ⟨0| a2a3ℓ G −I a2a3ℓ G ) on the remaining qubits.The unitary U makes Q controlled queries to each of U G and U † G , and 2Q queries to each of U h and U † h , and it has constant quantum gate overhead.Next, we give the remaining QSVT eigenstate filtering quantum circuit for the refined quantum linear system solver of proposition 2. We are interested in the null space of c(1) • H [1], which has ground-state energy equal to zero and spectral gap at least c(1)κ −1 As such, we employ the Chebyshev minimax polynomial where T l (•) is l-th Chebyshev polynomial of the first kind, as part of the corresponding QSVT quantum circuit.From [55, Lemma 2], R l has even degree d equal to d := 2l = 2 ⌈κ F (G) ln(2/ε qsp )⌉ for some ε qsp ∈ (0, 1] (63) where ε qsp is the precision to which R l approximates the optimal filter operator.The QSP subscript stands for "quantum signal processing." The circuit for the eigenstate filtering step is depicted in fig. 2. To implement it, one has to classically precompute the corresponding QSP angles {ϕ 1 , • • • , ϕ d }, which is best done by the methods of [57] (see also [59] and [58]).The query complexity to the block encoding U [1] is given by d, the additional gate overhead is as in fig.2, and the total number of qubits is 1 + 4 + ℓ.Finally, using the overall quantum circuit U from proposition 1 with constant approximation parameter ε 1 = 2 − √ 2 therein (to produce an input state to the quantum circuit of fig.2), gives the overall quantum circuit of the QLSS from proposition 2, which then solves the QLSP to error The tomography routine also requires the ability to perform controlled versions of the above circuits as described in eq. ( 49), and illustrated in fig. 3 (which replaces fig.1).The controlled circuits can be accomplished by rather simple modifications to the circuits in figs. 1 and 2 as follows.
Any QSVT circuit can be made controlled by simply controlling the application of the z rotation gates, since the rest of the circuit contains only symmetric applications of unitary gates and their inverses.Thus, we can create a controlled version of fig. 2 by simply performing controlled-σ z rotations, which requires two CNOT gates and an extra single qubit σ z rotation gate.Controlling the linear system portion is not enough to implement eq. ( 49).One must also follow this with a controlled state-preparation routine, controlled on the value of the qubit c being in the |1⟩ state.The full resource analysis for controlled state-preparation was reported in Ref. [33], and we refer the reader there for further details.We report the resource counts here in table IV.

V. IPM IMPLEMENTATION AND RESOURCE ESTIMATES FOR PO
The previous section reviewed the ingredients needed to implement the QIPM, namely, QLSS, block-encoding, and tomography.Here, we combine those ingredients to describe how the QIPM is actually implemented, making several observations that go beyond prior literature.We also perform a full resource analysis of the entire protocol and report resources needed to run the algorithm.

A. Main IPM loop and full pseudo-code
A QIPM is formed from an IPM by performing the step of solving a linear system with a quantum algorithm; the rest of the steps are classical.In Algorithm 1, we present pseudocode for the interior point method where the single quantum subroutine-approximately solving a linear system-appears in blue text.The input to Algorithm 1 is an SOCP instance with N variables, K linear constraints, and r second-order cone constraints, along with a tolerance parameter ϵ.Here we note that K = O(N ) in the case of the formulation of the PO problem we simulate in section VI.The output of the QIPM is a vector x that is O(ϵ) close to feasible, and O(ϵ) close to optimal.
The structure of the QIPM is in essence the same as that proposed by Ref. [13], but we give a more complete specification of the algorithm and make several new observations: • Classical costs: The IPM requires O( √ r log(1/ϵ)) iterations.In the classical case, when solving the PO problem via SOCP with an IPM, the cost of an iteration is dominated by the time needed to solve a linear system of size L×L, which is O(N 3 ) if done via Gaussian elimination, since L ∼ O(N ) in the PO problem.In the quantum case, this step is performed quantumly.However, even in the quantum case, some classical costs are incurred: one must classically compute the left-hand and right-hand sides of the Newton system in eq. ( 19) and eq.( 22) to be able to load this classical data into quantum circuits that perform the QLSS and tomography required to gain a classical estimate of the solution to the linear system.In particular, constructing the linear system requires classical matrix-vector multiplication to compute the residuals on the righthand-side of the Newton system in eq.(19).If the SOCP constraint matrix A is O(N ) × N and the number of cones r = O(N ), then this classical matrix-vector multiplication takes O(N 2 ) time in each of the O( √ N ) iterations.Thus, the QIPM requires at least O(N 2.5 ) classical time.Additionally, in our resource counts we use the minimal depth block-encoding circuits from Ref. [33], which require N 2 polylog(1/ε) classical time per iteration (although this can be parallelized) to compute angles and corresponding gate sequences to precision ε.These classical costs limit the maximum possible speedup of the QIPM over the classical IPM, but if the quantum subroutine is sufficiently fast that classical matrix-vector multiplication and angle computation is the bottleneck step, then this is a good signal for the utility of the QIPM.
• Preconditioning: Since the runtime of the QLSS depends on the condition number of the matrix G that appears in the linear system Gu = h, it is worth examining preconditioning techniques [70] for reducing the condition number.In the implementation we propose, we perform a very simple form of preconditioning.Let D be a diagonal matrix where entry D ii is equal to the norm of row i of the matrix G. Instead of solving the linear system Gu = h, we solve the equivalent system (D −1 G)u = D −1 h.Note that D −1 G and D −1 h can each be classically computed in O(N 2 ) time, roughly equal to the time required to compute h in the first place (see previous bullet), so this step is unlikely to be a bottleneck in the algorithm.In our numerical experiments, we observe that the condition number of D −1 G is typically more than an order of magnitude smaller than G, and sometimes several orders of magnitude (see fig. 9 in section VI).
• Norm of linear system and step length: As discussed in section IV B, QLSSs produce a normalized state |u⟩, where u is the solution to Gu = h, and quantum state tomography on |u⟩ can only reveal the direction of the solution u and not its norm.The norm can be estimated separately with a comparable amount of resources, but we observe that in the context of QIPMs, it is not necessary to learn the norm of the solution.If the direction of the solution is known, the amount by which to update the vector in that direction can be determined classically in O(N ) time as follows.If (∆x; ∆y; ∆τ ; ∆θ; ∆s; ∆κ) is the normalized solution to the Newton linear system in eqs.( 19) and ( 22), then the amount to step in that direction is equal to This expression is chosen such that the duality gap of the new point is exactly a factor of σ smaller than the old point, up to deviations that are second order in the step length.Note that if the old point is feasible and the solution to the linear system is exact, the second and higher order contributions vanish anyway.
• Adaptive tomographic precision and neighborhood detection: In Ref. [13], the choice of tomography precision parameter ξ was determined by a formula that aimed to guarantee staying within the neighborhood of the central path under a worstcase outcome.We observe that, since determining whether a point is within the neighborhood of the central path can be done in classical O(N ) time (see section III C 6), the precision parameter can instead be determined adaptively for optimal results: start with ξ = 1/2, solve the linear system to precision ξ and check if the resulting point is within the neighborhood of the central path.If yes, continue to the next iteration; if no, repeat the tomography with ξ ← ξ/2.Since the complexity of tomography is O(1/ξ 2 ), the cost of this adaptive scheme is proportional to a geometric series 4 + 16 + 64 + . . .+ O(1/ξ 2 ) of which the final term will make up most of the cost (accordingly, for simplicity, in our resource calculation we only account for the final term).This cost could be much lower than the theoretical value if the typical errors are not as adverse for the IPM as a worst-case error of the same size.
The pseudocode in Algorithm 1 illustrates the infeasible version of the algorithm (II-QIPM from table II).To implement the feasible versions (IF-QIPM and IF-QIPM-QR), minor modifications are made to reflect the process described in section III.

B. End-to-end quantum resource estimates
The QIPM described in the pseudocode takes 20 √ 2 √ r ln(ϵ −1 ) iterations to reduce the duality gap to ϵ, where r is the number of second-order cone constraints.In the case of the portfolio optimization problem we study, r = 3n + 1, where n is the number of stocks in the portfolio.Choosing the constant pre-factor to be 20 √ 2 allows us to utilize theoretical guarantees of convergence (modulo the issue of infeasibility discussed in section III C 5); however, it would not be surprising if additional optimization of the parameters or heuristic changes to the implementation of the algorithm (e.g.adaptive step size during each iteration) would lead to constantfactor speedups in the number of iterations.Since the number of iterations would be the same for both the quantum and classical IPM, these sorts of improvements would not impact the performance of the QIPM relative to its classical counterpart.

Quantum circuit compilation and resource estimate for quantum circuits appearing within QIPM
The QIPM consists of repeatedly performing a quantum circuit associated with the QLSS and measuring in the computational basis.Here we account for all the costs of each of these individual quantum circuits.There are two kinds of circuits that are needed: first, the circuit that creates the output of the QLSS subroutine, given by the state in eq. ( 36), and second, the circuit that creates the state needed to determine the signs of the amplitudes during the tomography subroutine corresponding to a controlled-QLSS subroutine, given in eq.(49).
To simplify the analysis, we first compile the circuits from the previous section into a primitive gateset that consists of Toffoli gates (and multi-controlled versions of them), rotation gates, block-encoding unitaries, statepreparation and controlled state-preparation unitaries.This compilation allows us to combine our previous indepth resource analysis for these primitive routines [33] with the additional circuits shown here.
From left to right in the U [s] circuit shown in fig. 1, we show the circuits for U Q h , CR 0 (s) (and equivalently CR 1 (s)), and V G in figs.4 to 6, respectively.In addition to these circuits, we must also perform controlled versions Algorithm 1: Quantum Interior Point Method Input: SOCP instance (A, b, c), list of cone sizes (N1, . . ., Nr) and tolerance ϵ Output: Vector x that optimizes objective function (eq.( 5)) to precision ϵ /* For portfolio optimization, A, b, c are given in eq.(10) /* from eqs. ( 19) and ( 22) */ Run tomography as described in section IV D using k applications and k controlled-applications of the QLSS algorithm on the system (G, h) return Vector ṽ′ for which ∥ṽ ′ ∥ = 1 and ∥ṽ ′ − v∥ ≤ ξ with probability at least 1 − δ, where v ∝ G −1 h of them within the tomography routine to estimate the sign of the amplitudes.The controlled-U [s] gate is given in fig. 3. The implementation of the controlled versions of CR 0 (s) (and equivalently CR 1 (s)), and V G are also depicted in figs.5 and 6, respectively.
With these decompositions in place, we now report in table V the resources required to perform each of the two kinds of quantum circuits involved in the QIPM (which are each performed many times over the course of the whole algorithm).The resource quantities are reported in terms of the number of calls Q to the block-encoding (which scales linearly with the condition number), as well as the controlled-block-encoding and state-preparation resources given previously in tables III and IV.The expressions also depend on various error parameters which must be specified to obtain a concrete numerical value.In section VI, after observing empirical scaling of certain algorithmic parameters, we make choices for all error parameters and arrive at a concrete number for a specific problem size.
, where f (s) given in eq. ( 59).The CR 1 (s) gate is identical but with the control bit sign flipped.Note that the Ry(±π/4) gates are Clifford conjugate to a single T or T † gate.6. Decomposition of the VG unitary (top) and controlled-VG unitary (bottom), as defined in eq. ( 57), into calls to a standard block-encoding unitary UG [33] and other elementary gates, using a single ancilla qubit initialized to the |0⟩ state.Not pictured are additional ancillas that begin and end in |0⟩ and are utilized to implement the unitary UG in shallower depth.

Resource estimate for producing classical approximation to linear system solution
The resource estimates described above capture the quantum resources required for a single coherent quantum circuit that appears during the algorithm.The out-put of this quantum circuit is a quantum state, but the QIPM requires a classical estimate of the amplitudes of this quantum state.This classical estimate is produced through tomography, as described in section IV D, by performing k = 57.5Lln(6L/δ)/(ε 2 (1−ε 2 /4)) repetitions each of the QLSS and controlled-QLSS circuits, where ε TABLE V.Quantum resources required to create the state output by the quantum linear system solver, given in eq. ( 36) (QLSS, left) or the state needed to compute the signs during the tomography subroutine, given in eq. ( 49) (Controlled QLSS, right) for a square linear system of size L = 2 ℓ .Note that these resource quantities do not yet account for the k classical repetitions needed in order to perform tomography on the output state.The parameters Q and d each scale linearly with the condition number of the linear system, as defined in proposition 2. The symbols N Qcbe , T Dcbe , and T Ccbe denote the number of logical qubits, the T -depth, and the T -count, respectively, for performing a controlled -block-encoding, as reported in table III.The symbols TDsp and TCsp are analogous quantities for state-preparation, as reported in table IV.The parameters εar, εtsp and εz ∈ (0, 1] are error parameters corresponding to the gate synthesis precision required for the CR 0 (s) and CR 1 (s) rotations, controlled state-preparation step required by tomography, and the QSVT phases, respectively.
is the desired tomography precision and δ is the probability that the tomography succeeds.In the implementation given in Algorithm 1, we fix δ = 0.1.Thus, to estimate the quantum resources of a single iteration of the QIPM, the previous resource estimates reported in table V should each be multiplied by k.We note with P processors large enough to prepare the output of the QLSS, these k copies could be prepared in k/P parallel steps, saving a factor of P in the runtime at the expense of a factor of P additional space.Our resources and scaling estimates do not account for any parallelization, and we assume completely serial execution and runtime.After multiplication by k, these expressions give the quantum resources required to perform the single quantum line of the QIPM, ApprSolve.This subroutine has both classical input and output and can thus be compared to classical approaches for approximately solving linear systems.

Estimate for end-to-end portfolio optimization problem
Recall that the full QIPM algorithm is an iterative algorithm, where each iteration involves approximately solving a linear system by preparing many copies of the same quantum states.The duality gap µ, which measures the proximity of the current interior point to the optimal point, begins at 1 and decreases by a constant factor σ with each iteration.Thus, the required number of iterations to reach a final duality gap ϵ is given by Recall from the discussion in section III C 3 that the output of the QIPM will achieve an O(ϵ) approximation to the optimal value of the objective function.
Pulling this all together, we now estimate the resources to perform the full QIPM algorithm, including the multiplicative factors needed to perform tomography as well as the number of iterations to converge to the optimal solution.Note that the relevant condition number κ F (G) and required linear-system precision ξ will vary from iteration-to-iteration as the Newton matrix G changes.The overall runtime can be upper bounded using the maximum observed value of κ F (G), which we denote by κ F , and minimum observed value of ξ across all iterations.At each iteration, to achieve overall precision ξ, the tomography precision ε is chosen to be just smaller than ξ (we choose ε = 0.9ξ), while all other error parameters (ε ar , ε tsp , ε z , etc.) are chosen to be small constant fractions of ξ, such that a total error budget of ξ is not exceeded.As the non-tomographic error parameters all appear underneath logarithms, these small constant factors will drop out of a leading order analysis, and it suffices to replace all of these error parameters with ξ.
We may then express the overall runtime in terms of κ F , ξ, L (the size of the Newton system), and r (the number of second-order cone constraints) up to leading order and including all constant factors, which we report in table VI.Recall that for the infeasible version of the QIPM acting on the self-dual embedding, we have L = 2N + K + 3, where N is the number of SOCP variables and K is the number of linear constraints.Note that in our leading order expression, we have assumed that the contributions proportional to Q = O(κ F ) dominate over terms proportional to d = O(κ F log(1/ξ)) at practical choices of ξ due to the large constant prefactor in the definition of Q (see proposition 2 and surrounding discussion).The left column of table I from the introduction is formed using the expressions in table VI, and substituting the corresponding relations between L and n, where n is the number of stocks in the portfolio optimization problem given in eq.(10).That is, we substitute r = 3n + 1 and L = 2N + K + 3 = 8n + 3m + 6 = 14n + 6 when we take m = 2n, where N is the number of SOCP variables, K is the number of SOCP constraints, n is the number of stocks, and m is the number of time epochs used to create the matrix M as described in section II.TABLE VI.Leading order contribution to the logical qubit count, T -depth, and T -count for the entire QIPM, including constant factors.The parameter L denotes the size of the Newton linear system and r denotes the number of secondorder cone constraints, while ϵ denotes the final duality gap that determines when the algorithm is terminated.For the infeasible QIPM running on an n-asset instance of portfolio optimization, as given in eq. ( 10), we have L = 14n + 6 and r = 3n + 1; these substitutions yield the results in table I.The parameter κF denotes the maximum observed Frobenius condition number and ξ denotes the minimum observed tomographic precision parameter across all iterations.

VI. NUMERICAL EXPERIMENTS WITH HISTORICAL STOCK DATA
The resource expressions in table VI include constant factors but leave parameters κ F and ξ unspecified.These parameters depend on the specific SOCP being solved.As a final step, we use numerical simulations of small PO problems to study the size of these parameters for different PO problem sizes.This information enables us to give concrete estimates for the resources needed to solve realistic PO problems with our implementation of the QIPM and sheds light on whether there could be an asymptotic quantum advantage.
Our numerical experiments simulate the entirety of Algorithm 1.The only quantum part of the algorithm is to carry out the subroutine ApprSolve (G, h, ξ).We simulate the quantum algorithm for this subroutine by solving the linear system exactly using a classical solver and then adding noise to the resulting estimated values to simulate the output of tomography.Since the tomography scheme illustrated in section IV D repeatedly prepares the same state and draws k samples from measurements in the computational basis, the result is a sample from the multinomial distribution.In our numerical simulation, we draw samples from this same multinomial distribution, thus capturing tomographic noise in a more precise way than by simply adding uniform Gaussian noise, as was done in Ref. [22].For simplicity, we assume that the part of the tomography protocol that calculates the signs of each amplitude correctly computes each sign.To numerically estimate resource counts, we must understand ultimately what level of precision ξ is required to stay close enough to the central path throughout the algorithm, as well as how large the Frobenius condition number κ F of the Newton system is.Importantly, we would like to know how these quantities scale with system size and duality gap µ, which decreases by a constant factor with each iteration of the QIPM.
In section III C 5, we discussed three formulations of the QIPM (see table II).The first (II-QIPM) is closely related to the original formulation from Ref. [13], which does not guarantee that the intermediate points generated by the IPM are feasible.The other two are instantiations of the inexact-feasible formulation proposed in Ref. [14], which requires pre-computing a basis for the null-space of the SOCP constraint matrix.The first of these computes a valid basis by hand (IF-QIPM), while the second uses a QR decomposition to find the basis (IF-QIPM-QR).We simulated all three versions and found that the II-QIPM was always able to stay close to the central path, despite the lack of a theoretical guarantee that this would be the case.Here we present the results of the II-QIPM.For comparison, in appendix E, we present some numerical results for the feasible QIPMs, which do benefit from a theoretical convergence guarantee, but have other drawbacks.
As discussed in section V A, we also implemented a very simple preconditioner that we find reduces the condition number by at least an order of magnitude with negligible additional classical cost.In all cases, we report resources estimates assuming a preconditioned matrix.

A. Example instance
In fig. 7, we present as an example the results of one of our simulations.We construct a portfolio optimization instance of eq. ( 3) by randomly choosing n = 30 stocks from the Dow Jones U.S. Total Stock Market Index (DWCF).We (arbitrarily) set parameters q = 1, ζ = 0.05 • 1, and we assume our previous portfolio w allocates weight to each stock in proportion to its market capitalization.The returns of the 30 stocks on the first m = 2n = 60 days in our dataset were used to construct an average return vector û and an m × n matrix M for which M ⊺ M = Σ, the covariance matrix for the stock returns, as described in section III B.
We simulate the infeasible QIPM acting on the corresponding SOCP in eq.(10).The figure illustrates how the simulation successfully follows the central path to the optimal solution after many iterations.The duality gap decreases with each step, and, crucially, the infeasibility and distance to the central path also decrease (exponentially) with iteration.Also plotted is the tomography precision Simulation of the QIPM on an SOCP instance corresponding to portfolio optimization on n = 30 randomly chosen stocks using m = 60 time epochs.The duality gap µ (defined in eq. ( 14)), the distance to the central path dF (defined in eq. ( 26)), and the infeasibility (defined as the norm of the residual on the right-hand-side in eq. ( 19)) each decrease exponentially with the number of iterations.The tomography precision ξ required to stay near the central path (defined adaptively as outlined in Algorithm 1) initially decreases and then plateaus at about 10 −2 .
ξ that was required to ensure that each iteration stayed sufficiently close to the central path (determined adaptively as described in the pseudocode in Algorithm 1).The plot exemplifies how, despite the lack of theoretical convergence guarantees, our simulations suggest that in practice the II-QIPM acting on the PO SOCP will yield valid solutions.
Remarkably, for this instance, we also observe that both the Frobenius condition number κ F and the inverse tomography precision ξ −1 initially increase but ultimately plateau with the iteration number, even as the duality gap gets arbitrarily small (see fig. 8 for data on κ F ).This scaling behavior was a generic feature of our simulations across all the instances we simulated.This contrasts with the worst-case expectation that the condition number can increase as κ F = O(1/µ) or κ F = O(1/µ 2 ) (depending on the formulation of the Newton system) [13,14].Prior literature does not say much about whether the quantity ξ −1 should be expected to diverge.One might expect that, since the neighborhood of the central path gets smaller as µ gets smaller (e.g., radius is proportional to µ in eq. ( 27)), the precision requirement to stay close to the central path would get more stringent in proportion to µ.However, it is important to recall that the step size from one iteration to the next also shrinks with µ, and that ξ represents the size of the error on the normalized Newton system solution; thus the neighborhood does not shrink relative to the distance to the optimum and the length of the next step, and there is no immediate reason that ξ −1 , as we have defined it, must diverge as µ → 0. However, one does expect that in the worst case, if the condition number κ diverges, then ξ −1 should also diverge, as errors of constant size ξ on the estimate of u/∥u∥ can lead to residual errors of divergent size κξ on the normalized product Gu/∥Gu∥.We hope that future work can better elucidate why κ F and ξ −1 do not diverge on these instances.11

B. Scaling of condition number
To understand the problem scaling with portfolio size, we generate example problem instances by randomly sampling n stocks from the DWCF, using returns over m = 2n time epochs (days) to construct our SOCP as in eq.(10).Parameters q, ζ, w, û and M are all chosen in the same way as described above.We plot the Frobenius condition number of the Newton matrix as well as the preconditioned Newton matrix as a function of the duality gap in fig.8 for portfolios of size n ∈ {60, 80, 100, 120}.Here we confirm our previous remark that the condition number appears to plateau at a certain value of the duality gap, especially for the preconditioned matrix.
Key to understanding the asymptotic scaling of the quantum algorithm is to determine how the condition number scales as a function of the number of assets, as the runtime of the QLSS algorithm grows linearly with the condition number.In fig.9, we plot the Frobenius condition number κ F as a function of n, the number of stocks, observed at duality gaps µ ∈ {10 −1 , 10 −3 , 10 −5 , 10 −7 }.At duality gaps of 10 −5 and 10 −7 , the condition number κ F has plateaued as observed in fig.8.We perform a non-linear fit to the data using a power law κ F = an b model, where a and b are fit parameters, and we report the exponents b in table VII.All exponents appear to be near or less than unity.VII.In all four cases, the exponent is less than 1 and in the latter three cases it is greater than 0.9 suggesting a nearly linear-in-n trend.

C. Scaling of tomography precision
While the depth of the individual quantum circuits that compose the QIPM scales only with the Frobenius condition number, the QIPM also requires a number of repetitions of this circuit for tomography that scales as 1/ξ 2 , the inverse of the tomography precision squared.To see how this scales with problem size, we perform a similar analysis for ξ −2 that we previously performed for κ F .These results are presented in fig. 10 for the same four duality gaps of {10 −1 , 10 −3 , 10 −5 , 10 −7 }.To reduce the iteration-to-iteration variation in the tomography precision (which results from our adaptive approach to tomography in Algorithm 1), in calculating ξ −2 at duality gap µ, we take the average over the value of ξ −2 at the five iterations with duality gap nearest to µ.We fit the median of ξ −2 at each value of n to a linear model on a log-log plot, corresponding to a relationship ξ −2 = an b , and we report the implied exponent b in table VIII.In this case, it is hard to draw robust conclusions from the fits.The fit suggests that the median of ξ −2 is increasing with n on the interval n ∈ [10,120].However, the most striking feature of the data is that the instance-toinstance variation of ξ −2 is significantly larger than that of κ F .In fact, at µ = 10 −7 , the 84th percentile of in-TABLE Estimated exponent parameters for the Frobenius condition number κF obtained from the fits that are plotted in fig.9.

Duality Gap
Condition Number Scaling  To reduce iteration-to-iteration variation, an artifact of the adaptive approach to tomography, we average over the observed value of ξ −2 at the five iterations for which the duality gap is nearest the indicated value.The shaded regions correspond to the 16th to 84th percentile.Here logarithmic axes are used since (unlike for κF ) instance-to-instance variation covers multiple orders of magnitude even for a fixed value of n.The dashed lines correspond to a linear fit to the log-log data, where the slope is reported in table VIII.

Number of Stocks
stances at n = 10, the smallest size we simulated, had a larger value of ξ −2 than the 50th percentile of instances at n = 120, the largest size we simulated.
TABLE VIII.Estimated exponent parameters for 1/ξ 2 obtained from the fits that are plotted in fig.10.

D. Asymptotic scaling of overall runtime
Above we provided fits for κ F and ξ −2 as a function of n on the range n ∈ [10,120].Here we study the quantity n 1.5 κ F /ξ 2 , which determines the asymptotic scaling of the runtime of the QIPM.In fig.11, we plot this quantity at the same four duality gap values µ ∈ {10 −1 , 10 −3 , 10 −5 , 10 −7 }.The implied exponents arising from linear fits on a log-log axis are reported in table IX.They are generally consistent with summing the exponents from the previously reported fits.The data inherit from ξ −2 the feature that the instance-toinstance variation is orders of magnitude larger than the median.Taken at face value, the fits suggest that the scaling of the median algorithmic runtime on the inter- FIG.11.Median value of the estimated algorithm scaling factor computed as the median of n 1.5 κF /ξ 2 for 128 randomly sampled stock portfolios from the DWCF index as a function of portfolio size for duality gaps of 10 −1 , 10 −3 , 10 −5 , and 10 −7 .As in Fig. 10, we average over 5 consecutive points to reduce iteration-to-iteration variance deriving from adaptive tomography.Here we also use the actual number of observed samples that were required to achieve sufficient tomographic precision in place of the tomgraphic factor n/ξ 2 .The shaded regions correspond to the 16th to 84th percentiles.The lines correspond to a linear fit to the log-log data, where the slope is reported in table IX.
val n ∈ [10, 120] is similar to the n 3.5 scaling of classical IPMs using Gaussian elimination, and worse than the asymptotic n 2.87 arising from classical IPMs using fast matrix-multiplication techniques to solve linear systems [49,50] (note that this scaling does not apply until n becomes very large, so it is not a good practical comparator).However, the large variance and imperfect fits do not give us confidence that these trends can be reliably extrapolated to larger n.Accordingly, when we compute actual resource counts in the next subsection, we stick to n = 100 and do not speculate on precise estimates for larger (more industrially relevant) n.Our numerical experiments fail to provide significant evidence for an asymptotic polynomial quantum speedup, but neither do they definitively rule it out.Toward that end, note that if the version of tomography we have studied were to be replaced with the more advanced recently proposed tomography scheme of Ref. [34], the runtime of the QIPM would instead grow as n 1.5 κ F /ξ, while introducing some additional gate overhead.Our fits from table VIII suggest this could reduce the asymptotic exponent, but by no more than about O(n 0.6 ) or so.
Ultimately, we do not believe it is essential to pin down the asymptotic scaling of the algorithm, because the main finding of our work is that, even if a slight asymptotic polynomial speedup exists, the size of the constant prefactors involved in the algorithm preclude an actual practical speedup, barring significant improvements to multiple aspects of the algorithm.In the next subsection, we elaborate on this point in a more quantitative fashion.
TABLE IX.Exponent parameter estimates from the fits to the line generated by plotting n 1.5 κF /ξ 2 in fig.11, which determines the overall scaling of the runtime of the QIPM.For comparison, CIPMs using Gaussian elimination have runtime O(n 3.5 ) and CIPMs using faster methods for solving linear systems have runtime O(n 2.87 ).

Duality Gap
Algorithm Scaling

E. Numerical resource estimates
Rather than examine algorithmic scaling, we now compute actual resource counts for the QIPM applied to PO.Ultimately, it is these resource counts that matter most from a practical perspective.We estimate the total circuit size in terms of the number of qubits, T -depth, and T -count for a portfolio of 100 assets.We chose this size because it is small enough that we can simulate the entire quantum algorithm classically.However, at this size, solving the PO problem is not classically hard; generally speaking, the PO problem becomes challenging to solve with classical methods only once n is on the order of 10 3 to 10 4 .A similar concrete calculation could be performed at larger n by extrapolating trends observed in our numerical simulations, but we are not confident that the fits on n ∈ [10,120] reported above are reliable predictors for larger n.
Recall that the only step in the QIPM performed by a quantum computer is the task of producing a classical estimate to the solution of a linear system to error ξ.The complexity of this task as it is performed within the QIPM depends on ξ as well as the Frobenius condition number κ F .The first step of our calculation is to fix values for ξ and κ F at n = 100.We choose them by taking the median over the 128 samples in our numerical simulation at duality gap µ = 10 −7 .
Once κ F and ξ are fixed, we must now determine concrete values for the various other error parameters that appear in the algorithm such that overall error ξ can be achieved.Tomography dominates the complexity and overall error, but there are a number of other factors that contribute to the error in the final solution.We enumerate and label the sources of error here, for completeness: √ p i |i⟩ needed for computing the signs in the tomography routine In section IV, we described a quantum circuit that prepares a state |ṽ⟩ (after postselection) for which ∥|ṽ⟩ − |v⟩∥ ≤ ε QLSP .If the block-encoding unitaries, statepreparation unitaries, and single-qubit rotations were perfect, then the only contribution to ε QLSP would be from eigenstate filtering and we would have ε QLSP ≤ ε qsp .Note the relationship d = 2κ F ln(2/ε qsp ) from proposition 2. Since the block-encoding unitary U G , the statepreparation unitary U h , and the single-qubit rotations are implemented imperfectly, there is additional error.In preparing the state, the unitary U G is called 2Q + 2d times and the unitary U h is called 4Q + 4d times, where Q is given in proposition 2. Additionally, there are 2Q combined appearances of CR 0 (s) and CR 1 (s) gates, where each appearance requires two single-qubit rotations.Note that the appearances of CR 0 (s) and CR 1 (s) within the eigenstate filtering portion of the circuit do not contribute to the error because at s = 1 these gates can be implemented exactly.Finally, there are another d single-qubit rotations required to implement the eigenstate filtering step.Since operator norm errors add sublinearly, we can thus say that (66) Now, the result of proposition 4 implies that, in order to assert that the classical estimate ṽ′ output by tomography satisfies ∥ṽ ′ − v∥ ≤ ξ, it suffices to have where for convenience we recall the definitions (ignoring the O( √ κ F ) term) of Q and d as Recalling that the dominant term in the complexity of the algorithm scales as ε −2 but logarithmically in the other error parameters, to minimize the complexity we assign the majority of the error budget to ε: we let ε = 0.9ξ, and we split the remaining 0.1ξ across the remaining six terms of eq. ( 67).There is room for optimizing this error budget allocation, but the savings would be at most a constant factor in the overall complexity.
Note that elsewhere in the draft, we have referred to ξ as "tomography precision" since ε will dominate the contribution to ξ.Here, the resource calculation requires we differentiate ε from ξ, but when speaking conceptually about the algorithm, we focus on ξ as it is the more fundamental parameter: it represents the precision at which the classical-input-classical-output linear system problem is solved, allowing apples-to-apples comparisons between classical and quantum approaches.
With values for κ F , ε G , ε h , ε qsp , ε z , and ε tsp now fixed, we can proceed to complete the resource count using the expressions in table V.Note that for gate synthesis error, we use the formula R y = 3 log 2 (1/ε r ), where R y is the number of T gates needed to achieve an ε rprecise Clifford+T gate decomposition of the rotation gate [67].Putting this all together yields the resource estimates for a single run of the (uncontrolled) quantum linear system solver in table X at n = 100.We report these estimates both in terms of primitive block-encoding and state-preparation resources, as well as the raw numerical estimates.For the total runtime, we must also estimate the resources required for the controlled statepreparation routine.We have estimated these quantities, but to the precision of the estimates we report, the numbers are the same as the controlled version, so we exclude them for brevity.
To estimate total runtime, our estimates must be multiplied by the tomography factor k (for controlled and for uncontrolled) as well as the number of iterations N it = ⌈ln(ϵ)/ ln(σ)⌉, where ϵ is the target duality gap (which we take to be ϵ = 10 −7 ), and σ = 1.0−1/(20 √ 2r).While k will vary from iteration to iteration, in our calculation we assume the total number of repetitions is given by the simple product (2k)N it , which, noting that the value of ξ plateaus after a certain number of iterations, will give a roughly accurate estimate.Note that these 2kN it repetitions need not be done coherently, in the sense that the entire system is measured and reprepared in between each repetition.One can bound the tomography factor k to be k ≤ 57.5L ln(L)/ξ 2 , where ξ is determined empirically.However, our numerical simulations of the algorithm yield an associated value of k needed to generate the estimate to precision ξ, so we can use this numerically determined value directly.We find that the observed median value of k = 3.3×10 8 from simulation is multiple orders of magnitude smaller than the theoretical bound.Using this substitution for k and N it , we find the results shown in the right column of table I in the introduction.
To aid in understanding which portions of the algorithm dominate the complexity, we show a breakdown of the resources in fig.12.The width of the boxes is representative of the T -depth, while the height of the boxes represents the T -count.The number of classical repetitions, composed of tomography samples as well as IPM iterations needed to reach a target duality gap, contributes the largest factor to the algorithmic runtime.Of these two, quantum state tomography contributes more than the iterations needed to reach the target duality gap.Our exact calculation confirms that for the individual quantum circuits involved in the QLSS, the discrete adiabatic portion of the algorithm dominates over the eigenstate filtering step in its contribution to the overall quantum circuit T -depth.Within the adiabatic subroutine the primary driver of the T -depth and T -count is the need to apply the block-encoding operator Q times (see e.g.eq. ( 61)), where Q is proportional to the Frobenius condition number.An additional source of a large T -count arises from the need to block-encode the linear system, which causes the T -count to scale as O(L 2 ).TABLE X.Estimated number of logical qubits NQ, T -depth TD, and T -count TC required to perform the quantum linear system solver (QLSS) subroutine within the QIPM running on a PO instance with n = 100 stocks.This calculation uses the empirically observed median value for the condition number at duality gap µ = 10 −7 , which was κF = 1.6 × 10 4 .The full QIPM repeats this circuit k = O(n ln(n)ξ −2 ) times in each iteration to generate a classical estimate of the output of the QLSS, and also performs Nit = O(n 0.5 ) iterations, where the linear system being solved changes from iteration to iteration.In the left column, we write the resources as numerical prefactors times the resources required to perform the controlled-block-encoding of the matrix G (denoted by a subscript cbe), and the state-preparation of the vector |h⟩ (denoted by a subscript sp), defined in tables III and IV.Written this way, one can see the large prefactors occurring from the linear system solver portion of the algorithm.In the right column we compute the exact resources, including those coming from the block-encoding.The notation AeB is short for A × 10 B .The resource quantities we report are prohibitively large, even for the classically easy problem size of n = 100 assets in the portfolio optimization instance.Our detailed analysis allows us to see exactly how this large number arises, which is essential for understanding how best to improve it.We outline the several independent factors leading to the large resource estimates.

QLSS Prefactors Total
• The block-encoding of the classical data is called many times by the QLSS.This data is arranged in an L × L matrix (note that for a PO instance of size n with m = 2n, the Newton linear system has size roughly L ≈ 14n).These block-encodings can be implemented up to error ε G in O(log(L/ε G )) T -depth using circuits for quantum random access memory (QRAM) as a subroutine [33].While the asymptotic scaling is favorable, after close examination of the circuits for block-encoding, we find that in practice the T -depth can be quite large: at n = 100 and ε G = 10 −10 (it is necessary to take ε G very small since the condition number of G is quite large), block-encoding to precision ε G has a T -depth of nearly 1000.Importantly, this T -depth arises even after implementing several new ideas to minimize the circuit depth, presented by a subset of the authors separately in Ref. [33].
• The condition number κ F determines how many calls to the block-encoding must be made, and we observe that κ F is quite large for the application of portfolio optimization.Even after an attempt at preconditioning, κ F is on the order of 10 4 already for small SOCP instances corresponding to n = 100 stocks, and empirical trends suggest it grows nearly linearly with n.However, we believe that additional preconditioning could significantly reduce the effective value of κ F in this algorithm.
• The constant factor in front of the O(κ F ) in stateof-the-art QLSSs is also quite large: the theoretical analysis proves an upper bound on the prefactor of 1.2 × 10 5 .Numerical simulations performed in [18] suggested that, in practice, it can be one order of magnitude smaller than the theoretical value.Following these numerics, we take the constant prefactor to be 1.31 × 2305 in our numerical estimates, which still contributes significantly to the estimate.Future work should aim to reduce this constant or, alternatively, investigate whether other approaches, such as those based on variable-time amplitude amplification (VTAA) [54,71], could achieve better performance despite being asympototically suboptimal.12. Breakdown of the quantum resources required for a single coherent run of the uncontrolled version of the quantum algorithm needed to produce the state eq.( 36).As we did in table X, here we take the final duality gap to be µ = 10 −7 and the number of assets to be n = 100.Choices for the Frobenius condition number κF = 1.6 × 10 4 and number of tomographic repetitions k = 3.3 × 10 8 are informed by our numerical experiments, as discussed in section VI.A similar breakdown for the controlled version needed to produce the state eq.( 49) would be essentially the same.The eigenstate filtering sub-circuit follows a very similar alternating structure to the adiabatic evolution, with the U [j] block-encodings replaced with either U [1] or U [1] † , the reflection operator W replaced with phase rotations, and only d ≪ Q total number of iterations (refer to fig. 2 for details.) n = 100 and ε = 10 −3 is on the order of 10 11 (although our simulations suggest 2k = 7 × 10 8 suffice in practice).We note that this is another avenue for substantial improvement.For instance, the results of [34] could be used (see footnote 1 for more details).
• QIPMs, like CIPMs, are iterative algorithms; the number of iterations in our implementation is roughly 20 √ 2r ln(ϵ −1 ), a number chosen to utilize theoretical guarantees of convergence (note that r ≈ 3n).Taking n = 100 and ϵ = 10 −7 , our implementation would require 8 × 10 3 iterations.We suspect that the number of iterations could be significantly decreased if more aggressive choices were made for the step size.For example, similar to our adaptive approach to tomographic precision, one could try longer step sizes first, and shorten the step size when the iteration does not succeed.This sort of optimization would apply equally to CIPMs and QIPMs.
Remarkably, the five factors described above all contribute roughly equally to the overall T -depth calculation; the exception being the number of copies needed to do tomography, which is a much larger number than the others.Tomography would be the obvious place to begin to try to reduce the resource depth, perhaps by implementing the scheme recently proposed in Ref. [34], and by making modifications to the QIPM that might allow the parameter ξ to be larger in practice, or by using an iterative refinement method [14].Another comment regarding tomography is that, in principle, the k tomographic samples can be taken in parallel rather than in series.Running in parallel leads to a huge overhead in memory: one can reduce the tomographic depth by a multiplicative factor P at the cost of a multiplicative factor P additional qubits.Note that even preparing a single copy requires a daunting number of nearly ten million logical qubits at n = 100.Moreover, it is unlikely that improvements to tomography alone could make the algorithm practical, as the other four factors still contribute roughly 10 16 to the T -depth.
Besides the rather large constant factors pointed out above for tomography and especially for the QLSS, we also note that the multiplicative "log factors" that are typically hidden underneath Õ notation in asymptotic analyses contribute meaningfully here.For instance, the entire block-encoding depth is O(log(n/ε G )), which, in practice, is as large as 1000.Moreover, there is an additional ln(ϵ −1 ) ≈ 16 coming from the iteration count, and a ln(L) ≈ 7 from tomography.
This quantitative analysis of bottlenecks for QIPMs can inform likely bottlenecks in other applications where QLSS, tomography, and QRAM subroutines are required.While some parameters such as κ F and ξ are specific to the application we considered here, other observations such as the numerical size of various constant and logarithmic factors (e.g.block-encoding depth) would apply more generally in other situations.

B. Resource estimate given dedicated QRAM hardware
The bottlenecks above focused mainly on the T -depth and did not take into account the total T -count or the number of logical qubits, which are also large.Indeed, our estimate of 8 million logical qubits, as reported in table I, is drastically larger than estimates for other quantum algorithms, such as Shor's algorithm [73] and algorithms for quantum chemistry (e.g.[74]), both of which can be on the order of 10 3 logical qubits.By contrast, the current generation of quantum processors have tens to hundreds of physical qubits, and no logical qubits; a long way from the resources required for this QIPM.
However, it is important to note that, as for other algorithms requiring repeated access to classical data, the vast majority of the gates and qubits in the QIPM arise in the block-encoding circuits, which are themselves dominated by QRAM-like data-loading subcircuits [33].These QRAM-like sub-circuits have several special features.Firstly, they are largely composed of controlledswap gates, each of which can be decomposed into four T gates that can even be performed in a single layer, given one additional ancilla and classical feed-forward capability [75].Furthermore, in some cases, the ancilla qubits can be "dirty" [60,62], i.e. initialized to any quantum state, and, if designed correctly, the QRAM circuits can possess a natural noise resilience that may reduce the resources required for error correction [62].Implementing these circuits with full-blown universal and fault-tolerant hardware could be unnecessary given their special structure.Just as classical computers have dedicated hardware for RAM, quantum computers may have dedicated hardware optimized for performing the QRAM operation.Preliminary work on hardware-based QRAM data structures (as opposed to QRAM implemented via quan-tum circuits acting on logical qubits) shows promise in this direction [76,77].
Our estimates suggest that the size of the QRAM needed to solve an n = 100 instance of PO is one megabyte, and the QRAM size for n = 10 4 (i.e., sufficiently large to potentially be challenging by classical standards) is roughly 10 gigabytes, which is comparable to the size of classical RAM one might find on a modern laptop.These numbers could perhaps be reduced by exploiting the structure of the Newton matrix, as certain blocks are repeated multiple times in the matrix, and many of the entries are zero13 (see eqs. ( 10) and ( 19)).
With this in mind, we can ask the following hypothetical question: suppose that we had access to a sufficiently large dedicated QRAM element in our quantum computer, and furthermore that the QRAM ran at a 4GHz clock speed (which is comparable to modern classical RAM); would the algorithm become more practical in this case?Under the crude, conservative simplifying assumption that each block-encoding and state-preparation unitary requires just a single call to QRAM and the rest of the gates are free, we can give a rough answer by referring to the expression in table X, which states that 3×10 8 total block-encoding and state-preparation queries are needed. 14Thus, even if the rest of our estimates stay the same, the number of QRAM calls involved in just a single QLSS circuit for n = 100 would be 3 × 10 8 .Accounting for the fact that the QIPM involves an estimated 6 × 10 12 repetitions of similarly sized circuits, the overall number of QRAM calls needed to solve the PO problem would be larger than 10 21 , and the total evaluation time would be on the order of ten thousand years.Thus, even at 4GHz speed for the QRAM, the problem remains decidedly intractable.Nonetheless, we believe that if the QIPM were to be made practical, it would need to involve specialized QRAM hardware in combination with fundamental improvements to the algorithm itself.

C. Comparison between QIPMs and CIPMs and comments on asymptotic speedup
The discussion above suggests that the current outlook for practicality with a QIPM is pessimistic, but simultaneously highlights several avenues by which to improve the results.Even with such improvements, if QIPMs are to one day be practical, they need to at least have an asymptotic speedup over CIPMs.Here we comment on this possibility.The core step of both QIPMs and CIPMs is the problem of computing a classical estimate of the solution to a linear system, a task that is also of broad use beyond interior point methods.Thus, we need only compare different approaches to solving linear systems, and our conclusions are relevant in any application where linear systems must be solved.Accordingly, in table XI we give the asymptotic runtime of several approaches to solving an L × L linear system to precision ξ, including the (QLSS + tomography) approach utilized by QIPMs, as well as two classical approaches.Whereas prior literature (e.g.[13]) primarily compared against Gaussian elimination (which scales as O(L 3 )), we also note a comparison against the randomized Kaczmarz method [51], which scales as O(Lκ 2 F ln(ξ −1 )).This scaling comes from the fact that 2κ 2 F ln(ξ −1 ) iterations are needed, and each iteration involves computing several inner products at cost O(L).We observe that the worst-case cost of an iteration is 4L floating point multiplications, meaning all the constant prefactors involved are more-or-less mild.Thus, the asymptotic quantum advantage of the QIPM is limited to an amount equal to O(min(ξ 2 κ F , ξ 2 L 2 /κ F )), which is at most O(L) when κ F ∝ L and ξ = O(1).Encouragingly, our numerical results are consistent with κ F ∝ L. However, our results are not consistent with ξ = O(1), suggesting instead that ξ is decreasing with L.
If κ F ∝ L and ξ = O(1), we would find a total QIPM runtime of O(n 2.5 ), improving over classical O(n 3.5 ) for a portfolio with n stocks.This speedup would be a material asymptotic improvement over the classical complexity, but leveraging this speedup for a practical advantage might still be difficult.Firstly, the difference in the constant prefactor between the quantum and classical algorithms would likely negate the speedup unless n is taken to be very large.Secondly, the speedup would necessarily be sub-quadratic.In the context of combinatorial optimization, where quadratic speedups can be obtained easily via Grover's algorithm, even a quadratic speedup is unlikely to exhibit actual quantum advantage after factoring in slower quantum clock speeds and errorcorrection overheads [78].
Our results suggest that finding a practical quantum advantage for portfolio optimization might require struc-tural improvements to the QIPM itself.In particular, it may be necessary to explore whether additional components of the IPM can be quantized, and whether the costly contribution of quantum state tomography could be completely circumvented.Naively, circumventing tomography entirely is challenging, as it is vitally important to retrieve a classical estimate of the solution to the linear system at each iteration in order to update the interior point and construct the linear system at the next iteration.Nevertheless, tomography represents a formidable bottleneck that must be addressed.
While our results are pessimistic on the question of whether quantum interior point methods will deliver quantum advantage for portfolio optimization (and other applications), it is our hope that by highlighting the precise issues leading to daunting resource counts, our work can inspire innovations that render quantum algorithms for optimization more practical.Finally, we conclude by noting that detailed, end-to-end resource estimations of the kind we performed here are vitally important for commercial viability of quantum algorithms and quantum applications.While it is essential to discover and prove asymptotic speedups of quantum algorithms over classical, an asymptotic speedup alone does not imply practicality.For this, a detailed, end-to-end resource estimate is required, as the quantum algorithm may nevertheless be far from practical to implement.As we have seen, the devil is in the details, and there are many details behind which the devil can hide.
σ: step length parameter -L: size of (square) Newton matrix ϵ: input to IPM specifying error tolerance, algorithm terminates once duality gap falls beneath ϵ • Important relations between parameters -Self-dual embedding has 2N + K + 3 parameters and N + K + 2 linear constraints -Newton matrix has size L = 2N + K + 3 for infeasible approach and L = N + 1 for feasible approach -For PO formulation in eq. ( 10), N = 3n + m + 1, r = 3n + 1, K = 2n + m + 1 -In our numerical experiments, we choose m = 2n • Symbols related to quantum linear system solvers -G: L × L matrix encoding linear constraints h: length-L vector encoding right-hand-side of linear constraints u: solution to linear system Gu = h • Symbols related to tomography k: number of measurements on independent copies of the state δ: probability of failure ε: guaranteed error of tomographic estimate ξ: overall precision of solution to linear system, dominated by tomographic error

and the estimates p
for all i.
3. The actual amplitudes p ′ i of the state created in the second step satisfy | p ′ i − √ p i | ≤ ε tsp .
From proposition 3, we know that Assertion 1 holds with probability at least 1 − δ/3, and Assertion 2 holds with probability at least 1 − 2δ/3.Therefore both assertions hold with probability at least 1 − δ.Moreover, Assertion 3 holds by assumption.From here on we will assume that all three assertions hold.Let a i be the real part and b i be the imaginary part of the quantity √ pṽ i .Let r + i = | √ pṽ i + √ p i |, and r − i = | √ pṽ i − √ p i |.Note that r + i and r − i are proportional to the absolute value of the ideal amplitudes of the state created in eq. ( 50).One can show that Define f i (x, y) = (x 2 − y 2 )/ √ p i ; then a i = f i (r + i /2, r − i /2).We first note that the estimates p ± i give us good approximations of r + i /2 and r − i /2: which follows from Assertions 2 and 3.The amplitudes ãi that define the estimate output by the tomography algorithm are given in eq. ( 52), which can now be rewritten as We prove that the ãi values approximate the a i values, specifically We will prove the claim above using a case-by-case analysis.Assume that a i ≥ 0; the case a i < 0 will proceed similarly.
First, consider the case √ p i ≤ 2 3 ε ′ + ε tsp .In this case ãi = 0 and a i ≤ √ p|ṽ Second, consider the case f i ( p + i , p − i ) ≥ a i .From the definition of ãi and Assertion 1, we have ãi ≤ √ p i ≤ √ p|ṽ i | + ε ′ 3 , and thus We also have (again invoking Assertion 1) Here in the second line we used eq.(B9) and the fact that r + i ≥ √ p i ≥ 2 3 ε ′ + ε tsp .We now upper bound r + i + r − i : where in the fourth line we used a i ≤ a 2 i + b 2 i = √ p|ṽ i | ≤ √ p i + ε ′ /3 (Assertion 1).Therefore, where in the fourth line we have used ε/ √ p i ≤ 1.This implies Here, we used a i − √ p i ≤ √ p|ṽ i | − √ p i ≤ ε ′ /3.
We've shown that |ã i −a i | ≤ ε ′ +ε tsp +|b i | for all cases.Therefore, and hence position must be classically pre-computed to implement this method.However, the advantage of the IF-QIPM-QR method is not large enough for any of the qualitative conclusions in section VII to change.The IF-QIPM method has the worst performance, which we believe is due to the fact that the null-space basis found by inspection turns out to be a very ill-conditioned matrix (its condition number was observed to be in the vicinity of 1000).Additionally, the IF-QIPM appears to have the largest instance-to-instance variation of any of the methods, leading to lower quality numerical fits.

FIG. 1 .
FIG. 1. Main component of the quantum circuit for proposition 1, described in [18, appendix E], enacting the unitary U [s] on registers a3a4a2a1LℓG of the scaled Hamiltonian c(s) • H[s], where H[s] = (1 − f (s))H0 + f (s)H1, on registers a4a1L.The necessary quantum gates and functions are defined in eqs.(53) to (59) except for sub-circuit UQ h , which is depicted in fig.4.The unitary U [s] is then used in eq.(61) to define the overall quantum circuit U for proposition 1.
FIG. 7.Simulation of the QIPM on an SOCP instance corresponding to portfolio optimization on n = 30 randomly chosen stocks using m = 60 time epochs.The duality gap µ (defined in eq.(14)), the distance to the central path dF (defined in eq.(26)), and the infeasibility (defined as the norm of the residual on the right-hand-side in eq.(19)) each decrease exponentially with the number of iterations.The tomography precision ξ required to stay near the central path (defined adaptively as outlined in Algorithm 1) initially decreases and then plateaus at about 10 −2 .

FIG. 8 .
FIG.8.Median Frobenius condition κF number for 128 randomly sampled stock portfolios from the DWCF index as a function of the duality gap for portfolios of size 60, 80, 100, and 120 stocks.The shaded regions indicate the 16th to 84th percentile.We observe that the condition number appears to plateau at small values of the duality gap.

FIG. 9 .
FIG.9.Median Frobenius condition number κF for 128 randomly sampled stock portfolios from the DWCF index as a function of portfolio size for duality gaps of 10 −1 , 10 −3 , 10 −5 , and 10 −7 .The shaded regions correspond to the 16th to 84th percentile.The lines represent power-law fits of the form an b , where the values for b are reported in table VII.In all four cases, the exponent is less than 1 and in the latter three cases it is greater than 0.9 suggesting a nearly linear-in-n trend.

FIG. 10 .
FIG.10.Median value of the square of the required inverse tomography precision ξ −2 required to remain in the neighborhood of the central path for 128 randomly sampled stock portfolios from the DWCF index as a function of portfolio size for duality gaps of 10 −1 , 10 −3 , 10 −5 , and 10 −7 .To reduce iteration-to-iteration variation, an artifact of the adaptive approach to tomography, we average over the observed value of ξ −2 at the five iterations for which the duality gap is nearest the indicated value.The shaded regions correspond to the 16th to 84th percentile.Here logarithmic axes are used since (unlike for κF ) instance-to-instance variation covers multiple orders of magnitude even for a fixed value of n.The dashed lines correspond to a linear fit to the log-log data, where the slope is reported in table VIII.

3 ) ≈ 3 ×
FIG.12.Breakdown of the quantum resources required for a single coherent run of the uncontrolled version of the quantum algorithm needed to produce the state eq.(36).As we did in table X, here we take the final duality gap to be µ = 10 −7 and the number of assets to be n = 100.Choices for the Frobenius condition number κF = 1.6 × 10 4 and number of tomographic repetitions k = 3.3 × 10 8 are informed by our numerical experiments, as discussed in section VI.A similar breakdown for the controlled version needed to produce the state eq.(49) would be essentially the same.The eigenstate filtering sub-circuit follows a very similar alternating structure to the adiabatic evolution, with the U [j] block-encodings replaced with either U[1] or U [1] † , the reflection operator W replaced with phase rotations, and only d ≪ Q total number of iterations (refer to fig.2for details.)

-•
v: normalized solution to linear system u/∥u∥ ε QLSP : error in solution to linear system -ṽ: normalized output of the QLSS, which should satisfy ∥v − ṽ∥ ≤ ε QLSP ℓ: ⌈log 2 L⌉ -U G : block-encoding unitary for G ℓ G : number of ancilla qubits used by U G -U h : state-preparation unitary for |h⟩ κ F (G):Frobenius condition number∥G∥ F ∥G −1 ∥ of G -Q: number of queries to U G and U h (proposition 1) -C: constant prefactor of κ F (proposition1) d: the degree of the polynomial used in eigenstate filtering (proposition 2) Symbols related to block encoding and state preparation ε G : block-encoding error for matrix G ε h : state-preparation error for vector h ε ar : Gate synthesis error for rotations needed by CR 0 (s) and CR 1 (s) ε z : Gate synthesis error for rotations needed by the QSP phases ε qsp : Error due to polynomial approximation in eigenstate filtering ε tsp : Error in preparing the state L i=1 √ p i |i⟩ needed for the tomography routine -N Qbe , T Dbe , and T Cbe : number of logical qubits, T -depth, and T -count required for block-encoding.-N Qcbe , T Dcbe , and T Ccbe : number of logical qubits, T -depth, and T -count required for controlled -block-encoding.-N Qsp , T Dsp , and T Csp : number of logical qubits, T -depth, and T -count required for state preparation.-N Qcsp , T Dcsp , and T Ccsp : number of logical qubits, T -depth, and T -count required for controlled -state preparation.

TABLE IV .
[33]cal quantum resources required to prepare an arbitrary ℓ-qubit quantum state |h⟩ from classical data (left column) and a single-qubit controlled version (right column) to precision ε h ∈ (0, 1].Here we have suppressed terms doubly and triply logarithmic in L and 1/ε h (see[33]).For a single-qubit control, there are no additional Clifford gates required, which can be observed by examining the state-preparation procedure in [33, Sec.IIID] and noting that we can prepare the state |0⟩ |0⟩ ⊗ℓ + |1⟩ |ψ⟩ with minor modifications to the procedure that prepares |ψ⟩.First, we use the "flag" qubits to control both the angle loading and unloading steps (rather than just the unloading steps), and second, we control every flip of the flag qubits in that procedure with the first single-qubit control, thus turning NOT gates into CNOT gates, which are also Clifford.When the control is ON, the procedure works as before, and when the control is OFF, none of the qubits leave the |0⟩ state.

•
Pure state tomography requires preparing many copies of the output |v⟩ of the QLSS.We improved the constant prefactors in the theoretical analysis beyond what was known, but even with this improvement, the number of queries needed to produce an estimate v ′ of the amplitudes of |v⟩ up to error ε in ℓ 2 norm is 115L ln(L)/ε 2 , which for Adiabatic evolution: TD ≈ 3 × 10 11 ; TC ≈ 1 × 10 17

TABLE XI .
Comparison of time complexities of different approaches for exactly or approximately solving an L×L linear system with Frobenius condition number κF to precision ξ.The comparison highlights how a quantum advantage only persists when κF is neither too large nor too small.The constant pre-factor roughly captures the T -depth that we found for the quantum case (the same pre-factor from Tab. VI after discounting the 20 √ 2 IPM iteration factor) and the number of multiplications in the classical case.

TABLE XII .
Fit parameters for the Frobenius condition number for the four horizontal-axis locations considered on the scaling plot of fig.13.The uncertainties correspond to one standard deviation errors on the parameter estimates from the fit.We note that both versions have similar empirical scaling, although the fits are better for IF-QIPM-QR.The constant prefactors are superior for the IF-QIPM-QR version, but calculating the QR decomposition requires a one-time classical cost proportional to O(L 3 ).