Noisy intermediate-scale quantum computing algorithm for solving an $n$-vertex MaxCut problem with log($n$) qubits

Quantum computers are devices, which allow more efficient solutions of problems as compared to their classical counterparts. As the timeline to developing a quantum-error corrected computer is unclear, the quantum computing community has dedicated much attention to developing algorithms for currently available noisy intermediate-scale quantum computers (NISQ). Thus far, within NISQ, optimization problems are one of the most commonly studied and are quite often tackled with the quantum approximate optimization algorithm (QAOA). This algorithm is best known for computing graph partitions with a maximal separation of edges (MaxCut), but can easily calculate other problems related to graphs. Here, I present a novel quantum optimization algorithm, which uses exponentially less qubits as compared to the QAOA while requiring a significantly reduced number of quantum operations to solve the MaxCut problem. Such an improved performance allowed me to partition graphs with 32 nodes on publicly available 5 qubit gate-based quantum computers without any preprocessing such as division of the graph into smaller subgraphs. These results represent a 40% increase in graph size as compared to state-of-art experiments on gate-based quantum computers such as Google Sycamore. The obtained lower bound is 54.9% on the solution for actual hardware benchmarks and 77.6% on ideal simulators of quantum computers. Furthermore, large-scale optimization problems represented by graphs of a 128 nodes are tackled with simulators of quantum computers, again without any predivision into smaller subproblems and a lower solution bound of 67.9% is achieved. The study presented here paves way to using powerful genetic optimizer in synergy with quantum computers


INTRODUCTION
A universal quantum computer has been the holy grail of quantum technology [1].Such a device would allow more efficient searching through databases [2], prime number factorization [3], and more efficient solutions of systems of linear equations [4], just to name a few.However, universal quantum computers require millions of qubits with quantum error correction implemented and with an implementation timeline which is difficult to predict [5].On the other hand, devices with up to 127 noisy qubits are readily available.This steered the scientific community towards exploring potential computational advantages which such devices could bring.In the rapidly expanding field of Noisy Intermediate-Scale Quantum Computing (NISQ) [6,7] two algorithms stand out in prospect: the Variational Quantum Eigensolver (VQE) [8,9] and the aforementioned Quantum Approximate Optimization Algorithm (QAOA) [10][11][12].The VQE is mainly applied to problems in chemistry and material science while the QAOA is best known for computing graph partitions with a maximal separation of edges (MaxCut), but can easily calculate other properties of graphs, such as MaxIndependent Set and the partition problem, just to name a few.Given a graph G = (V, E) comprising of |V | vertices and |E| nodes the QAOA requires n = |V | qubits and p (|E| + |V |) quantum operations to implement the ansatz for the MaxCut problem [11,12].Here, p is the phenomenological depth parameter.
In this manuscript a novel variational MaxCut algorithm requiring n = log 2 |V | qubits is introduced, where stands for the ceiling function.For example if x = 2.1, x = 3, if x = 2.7, x = 3.In similarity with all other NISQ algorithms the algorithm presented here iteratively improves a trial solution (ansatz) in a hybrid quantum-classical optimization loop.The ansatz is implemented with at most 2 n − 2n + 5 single qubit gates and at most 2 n − 2 two qubit CNOT gates (in total up to 2 n+1 − 2n + 3 gates).Exploiting the fact that large graphs can be treated with the algorithm at a low resource overhead I demonstrate the calculation of a MaxCut of a graph of 32 nodes on a publicly available device of only 5 qubits.This is a 40% increase in graph size compared to state-of-the-art experiments with QAOA on gatebased quantum computers such as the Google Sycamore [13] [14] .The algorithm presented here opens perspective for immediate quantum speedup with contemporary quantum processors, given that the quantum hardware community is still some years away from producing processors with hundreds of qubits required for quantum speedup with QAOA [15].Furthermore, a graph of 128 nodes is partitioned on contemporary simulators of quantum computers.With this methodology, simulators of quantum computers become a powerful tool for graph partitioning, being able to tackle graphs of hundreds of nodes without dividing the problem into smaller subgraphs, such as the work done in [16] or focusing on correlation between pairs of classical independent variables such as work done in Ref. [17].
The work presented here maps a MaxCut cost function to multi-modal multidimensional cost function which can be evaluated on Quantum Processing Units (QPUs).In the subsequent step it relies on the powerful class of global optimizers called "genetic optimizers" which are proven to excel in finding minima of multi-dimensional multi-modal black-box cost functions [18][19][20].

METHODOLOGY Variable reduction
In this subsection a variable reduction technique compatible with the MaxCut problem is going to be presented.This is done in order to make the problem amendable to classical optimizers which cannot easily treat multi-dimensional data such as for example the surrogate model EGO optimizer [21].Also, as shown in Supplementary Material Fig. S2, variable reduction, yields better results for lower graph densities and lower result variance for all graph densities.Due to the fact that the binary optimization problem of finding a MaxCut is NP-hard, a first approach to approximately solve the problem would be to linearly relax the problem.Meaning that instead of assuming that binary optimization variables in the MaxCut problem are integers 0 or 1, one assumes that they are continuous variables [0, 1].In the field of semi-definite programming [22] a different, more efficient approach is taken, binary variables are substituted with vectors.Such classical method of approximately solving the MaxCut problem is state-of-theart and has a maximum possible performance guarantee of α = 0.87856 as proved by Goemans and Williamson and can be performed in polynomial time [22].
Here an alternative approach compatible with quantum computing is presented.Let me now introduce a continuous, differentiable function of the following form R f (x, q, m) = exp − exp 2 m−q sin(2 q x + x 0 (q, m)) , (1) where x 0 (q, m) = arcsin (ln(− ln(0.5))/2m−q ) and the integer m ≥ |V | and 0 ≤ q ≤ |V | − 2. R f is defined in such a way to be n-differentiable and to have a minimum (maximum) at 0(1) and rapidly changing multi-oscillatory behavior in between extremas.It should be noted that this definition is not by any means unique.Assume a graph where |V | 1, consequently m is a large number.R f (x, 0, m) is such a function which is mostly 0 in the region of 0 ≤ x ≤ π and rapidly changes to 1 in the region π < x ≤ 2π (see red  (1) for q = 0 red, q = 1 dashed green and q = 2 dotted black for m = 4.

The algorithm
A Laplacian of a graph L(G), where G = (V, E) is a |V | × |V | matrix with |V | positive terms on the diagonal and 2|E| off-diagonal terms.The ith diagonal term of the graph Laplacian corresponds to the number of connections the node i has with remaining nodes in the graph and the ijth offdiagonal term of the matrix is the negative weight between the ith and the jth node.
Similarly to a real-valued Hamiltonian in quantum mechanics, the graph Laplacian is symmetric, furthermore it has a spectrum (eigenvalue range) between 0 and its largest eigenvalue.Now, I introduce a partition vector V of length |V | with ith term equaling +1 if ith vertex of G belongs to the first sub-graph in the graph partition and ith term −1 if the ith node belongs to the second sub-graph in the graph partition.Then, the number of cuts in the graph bi-partition is N cuts = V T LV/4 [23], and this formula is a central piece of the algorithm presented here.By finding the vector V which maximizes N cuts a MaxCut of the graph is found.Vector V has 2 |V | possible values and there is no known algorithm which can exactly find V which maximizes N cuts with a computational complexity which is a polynomial of |V |.Now I present the structure of the algorithm for a preselection of r optimization variables x 1 ...x r .1. Trivial unconnected vertices are added to the graph so it has a dimension which is a power of two. 2. The graph Laplacian L(G) is represented as a sum of tensor products of unitary matrices, and denoted as L(G) in such form.The decomposition is a prerequisite for measuring the expectation value of the graph Laplacian on a QPU (step 5.) and also follows the logic of the implementation in IBMQ's Qiskit.where, the number of variables r is between 1 ≤ r ≤ 2 n and r mod 2 = 0, and m r ≥ 2 n /r + 2.

The number of cuts is calculated as
6. Variational parameter x is adjusted with a classical optimizer and steps 3-5 are repeated until a maximum is reached.
The algorithm presented here maps the MaxCut problem of a graph G = (V, E) comprising of |V | vertices and |E| edges to a problem of |V | energy levels coupled with |E| coupling terms described by a Hamiltonian L(G).The weight between the nodes w ij becomes a coupling strength between energy levels i and j.Energy levels i and j are residing at an energy equal to the connectivity of the node i and j respectively.
In Figure 2 I represent a simple example of a graph with four nodes.For instance the node 1 is connected with two other nodes (Figure 2 (a)).Therefore, the energy level 1 lies at an energy E = 2 in Figure 2 (b).Node 1 is connected with nodes 2 and 3 so the level 1 is coupled with levels 2 and 3 in the energy scheme.Such logic applies for all nodes of any graph.The algorithm searches for a unitary transformation of the Hamiltonian which maximizes the number of cuts.

Circuit depth, computational complexity and quantum advantage
Given a graph with |V | nodes n = log 2 (|V |) qubits are required to implement the algorithm.The multi-control multitarget qubit gate on n qubits required to realize the diagonal gate U in Eq. ( 3) can be straightforwardly realized with Grey codes [24] or in the context of follow up work [25] at a cost of (23/48) × 4 n − (3/2) × 2 n + 4/3 CNOT gates.However, exploiting the fact that U is a diagonal gate and following on the works of [25] and [26] it can be realized with 2 n −2 CNOT gates.This means that the ansatz is implemented with less than 2|V | two qubit CNOT gates which is in stark contrast with the QAOA ansatz which requires p|E| two qubit gates, where p is the depth parameter.Given that |E| |V | the algorithm presented here is much more efficient in the number of two qubit gates as compared to even the lowest depth p = 1 QAOA.
The algorithm presented here is a heuristic, meaning that its depth is case dependent.However, the quantum implementation of the heuristic can be compared with its classical counter-part for every step of the evaluation.A classical computing variant of Eq. ( 3) is a vector-matrix-vector multiplication.For a |V | × |V | matrix the computational complexity of such an evaluation is In the course of this study the graph Laplacian is decomposed into sum of Pauli matrices with Pennylanne's qml.Hermitian function [27], afterwards a conversion to Qiskit was performed.Mathematically, the Pauli basis decomposition of a general Hermitian matrix H is conducted in the following fashion Tr(P i H)P i , It should be noted that all tensor products which are complex in nature (involving odd numbers of Y Pauli matrices per tensor product) yield a zero trace due to the real nature of the graph Laplacian in our study.Furthermore, in our case the graph Laplacian is a real Hermitian matrix and is thus symmetric, so only the decomposition for lower or upper triangular parts could be performed.Finally, such decompositions are easily parallelized on HPC architectures as all computations are independent one of another.On a quantum computer the computational complexity of evaluating Eq. queries with the so-called star-decomposition of the Hamiltonian [28,29], and efficiently simulating a Hamiltonian is equivalent to calculating an expectation value of a Hamiltonian [30].So for d−regular graphs every step in the heuristic algorithm executes in O(|V |d 2 (d + log * log |V | )) which is smaller than the classical O(|V | 2 ) for small d.Further reductions of the number of measurements could be conducted by the command group paulis=True in Pauli expectation class of qiskit.This allows grouping of Pauli strings which commute into same measurements.Also, with larger processorsthe tasks could be parallelized -certain qubits just estimating certain sets of Pauli stings.
In Table I I summarize the main differences between the algorithm presented here and the QAOA.The diagonal gates required to preform the algorithm presented here requires an allin-all connectivity between qubits for optimal performance.On the other hand, QAOA performs best when the qubits are connected in the same way as the nodes of the graph [13].
As of 2021 the second most powerful supercomputer in the world is Fugaku [31] with 158, 967 nodes each having 32 Gb of RAM.In total, this supercomputer can store 158, 967 × 32 Gigabytes of data, equaling to it being able to handle 5 • 10 15 bytes of data.Commonly one requires 8 bytes to store a real number [32,33], indicating that 0.64 • 10 15 double precision numbers can be stored in the RAM memory in such a device.Given that a graph Laplacian is a square matrix, the dimensions of a weighted graph Laplacian which such a supercomputer could handle is √ 0.64 • 10 15 × √ 0.64 • 10 15 = 25.2 • 10 6 × 25.2 • 10 6 .On the other hand side, a 25 qubit device (such as those already available at IBM and Google) could handle a 2 25 ×2 25 = 33.5•10 6×33.5•10 6 graph Laplacian.Of course, noise would be a limiting factor in handling such sizes of optimization problems in the pre-error correction era.However, next year is going to be the year in which large scale error mitigation is going to be implemented on IBMQ systems [5].This will increase the depth of circuits which could be executed on IBM QPUs to a larger level which is difficult to estimate at this point.It should also be noted that the way of calculating the number of cuts of a graph on a quantum computer as given by Eq. ( 3) and can also be applicable to a plethora of algorithms handling different aspects of graph cuts, not only MaxCut with the genetic optimizer as done in this study.

|V | n n
Connectivity graph inspired all-in-all all-in-all

RESULTS
In Figure 3 I compare the output of a simulator with the output of publicly available IBMQ Santiago.Although pure dephasing, shot noise and relaxation may distort the optimization landscape, maxima are clearly noticed although equal local maxima become unequal.The N cuts is estimated for 100 equidistant values of x.
I further present results obtained by benchmarking randomly generated 3-regular graphs of 32 nodes on actual quantum computers and simulators and randomly generated 3regular graphs of 128 nodes on a simulator of quantum computers (Qiskit).Intensive testing showed that the algorithm performs best when the number of variables is kept at 8 or 16 for graphs of the size 32-128 nodes.Intensive numerical testing also showed that a genetic optimizer is best suited for finding the maximum of the function -not too surprising as genetic optimizer is indeed best used for multi-modal cost functions.On top of the genetic algorithm, a number of classical optimizers were tried (COBYLA, Neldear-Mead, Basin-Hopping, Particle Swarm, EGO).Further details about the setting of the genetic optimizer can be found in the Supplementary Material S1.
For the case of 3-regular graphs of 32 nodes variable reduction is preformed so that the optimization landscape has 8 variables.The MaxCut is calculated with 5 qubits.The Goemans-Williamson algorithm (GW) is a classical approximate algorithm for the MaxCut problem has a performance guarantee of 0.87856 [22].I define the approximation ratio of the algorithm presented here with respect to the exact solution as where MaxCut is the value obtained with the algorithm presented here and MaxCut GW is the value obtained with Goemans-Williamson.For the first graph Figure 4 (a-c) the algorithm performed on a simulator of quantum computers yields 0.776 ≤ r ≤ 0.884 and the algorithm executed on IBMQ Quito 0.552 ≤ r ≤ 0.628.For the second graph Figure 4 (d-f) the algorithm preformed on a simulator of quan- tum computers yields 0.857 ≤ r ≤ 0.975 and the algorithm executed on IBMQ Quito 0.549 ≤ r ≤ 0.625.For both realizations there is a clear difference between actual hardware benchmarks and ideal simulation.I assume that the main reason for this is the distortion of the optimization landscape due to pure dephasing and relaxation.I expect that shot noise contributed less as the expectation value was estimated for the maximally allowed 8192 shots.
For the case of 3-regular graphs of 128 nodes variable reduction is performed so that the optimization landscape has 16 variables.The MaxCut is calculated with 7 simulated qubits under the assumption of no noise processes.Given that devices larger than 5 qubits are unavailable to the author, for these 3 graphs I stayed in the domain of quantum simulators.For the first graph Tab.II seed 7 0.679 ≤ r ≤ 0.773, second graph Tab.II seed 8 0.743 ≤ r ≤ 0.846, third graph Tab.II seed 9 0.709 ≤ r ≤ 0.807.Values do not converge as nicely as for smaller graphs likely because the genetic algorithm gets trapped in a local minimum with increasing system size.These results are visually represented in Supplementary Material S3.
A d-regular graph with |V | nodes has d × |V |/2 edges [34].An average random bi-partition of such a graph is d × |V |/4 [35], or in the case of 3-regular graphs with 32 nodes the average random bi-partition is 24.So both the quantum hardware results and the simulator results stay above the average random bi-partition value.An average random bipartition of a 3-regular graph of 128 nodes is 3 × 128/4 = 96.

CONCLUSION
In conclusion I have presented a novel algorithm for noisy intermediate-scale quantum computers requiring logarithmically less qubits and significantly less quantum gates as compared to the contemporary state-the-of-art algorithm -QAOA.I went through to calculate the MaxCut of a randomly generated 3-regular graph of 32 nodes, a 40% increase compared to experiments of state-of-the-art gate-based quantum computers (Google Sycamore).I did so with publicly available IBM hardware, and obtained a lower bound of 54.9% on the solution for actual hardware benchmarks and 77.6% on ideal simulators of quantum computers.Furthermore, I calculated the MaxCut of a 3-regular graph of 128 nodes with quantum simulator obtaining a lower bound of 67.9% on the solution and with no pre-processing of the graph what-so-ever.In Figure S3 I show how the genetic optimizer improves the cost function as a function of iteration.The value is negative as the optimizer is tuned to search for the minimum of the negative graph Laplacian under a unitary transformation. 0 FIG. 1. Eq. (1) for q = 0 red, q = 1 dashed green and q = 2 dotted black for m = 4.

FIG. 2 .
FIG. 2. (a) A simple graph where wij are the off diagonal elements of a graph Laplacian.(b) The mapping of the graph to a set of coupled energy levels.
where P i are all possible combinations of tensor products of the Pauli group composed of {I, X, Y, Z} acting on n qubits.It should be noted that the process of decomposing a Hermitian matix into sum of Paulis (see Eq. 4) is completed in O |V | 2 : up to |V | 2 summands, |V | 3 for the matrix-matrix product of P i and H and |V | for computing the trace.For example, for n = 2 qubits (up to |V | = 4 vertex graph) 4 n = |V | 2 = 16 possible products exist P i ∈ {II, IX, IY, IZ, XI, XX, XY, XZ, Y I, Y X, Y Y, Y Z, Y I, ZX, ZY, ZZ}.
(3) is O(|V | 3 ) (one power of |V | coming from the ansatz and up to |V | 2 summands yielding L(G)).This number although polynomial in |V | can still be quite high for large graphs and thus would require an estimation of the expectation value of the large number of summands on a quantum computer.However, simulating a d−sparse Hamiltonian (d− regular graph) is done in maximally O(d 2 (d + log * n))

= 25 FIG. 4 .
FIG.4.Randomly generated 3-regular graphs of 32 nodes.Nodes belonging to different partitions are marked in green and orange respectively and the MaxCut value is written on top of the graph.Graphs are randomly initialized by Python's networkx package where seed 20 is used for graphs (a-c) and seed 30 for graphs (e-g).GW stands for Goemans-Williamson.
S2 -VISUALLY REPRESENTING THE 128 NODE GRAPH CUTSIn Fig.S4I visually represent results of Tab.II from the main body of the paper.
FIG. S3.Cut as a function of the number of iterations for Fig. 4 main body of the paper.

TABLE I
. A table summarizing the difference between the approach presented here and QAOA.Complexities are given for one evaluation step and n = log 2 |V | .
FIG. S2.A classical simulation: negative MaxCut as a function of number of optimization variables for random seed 2 of a 128 vertex graph with varying density and 15% random uncertainty in cost function calculation.Optiimizeer was set like in Fig.S1, every point is a average of 20 repetitions.