Improving Quantum Simulation Eﬃciency of Final State Radiation with Dynamic Quantum Circuits

,


I. INTRODUCTION
High energy physics (HEP) simulations are one of the most natural and exciting applications of quantum computers, given the complex many-body quantum nature of HEP processes.Foundational work establishing the existence of polynomial scaling digital quantum algorithms for scattering calculations [2] has been followed by a variety of particle and nuclear physics investigations into simulations on quantum computers 1 .
While most of these studies propose quantum algorithms for the full scattering process on a quantum computer, a complementary approach has been proposed to exploit factorization [4].In particular, scattering cross sections approximately factor into pieces governed by physical processes occurring at different energy scales.One way to factorize a full calculation at a collider like the Large Hadron Collider (LHC) involves parton shower (PS) Monte Carlo.Parton showers govern the collinear radiation from high energy charged particles.Classical PSs approximate this radiation as a Markov Chain.This is an excellent approximation in some cases, but ignores certain interference effects.Recently, Ref. [1] introduced a quantum algorithm for parton showers (QPS) that models interference effects from intermediate flavor states.This algorithm requires only polynomial resources compared to existing exponentially scaling algorithms to model the same physics.While the QPS does not describe the full properties of PSs in the Standard Model (dominated mostly by the strong force), it represents an important benchmark for developing and testing HEP algorithms on quantum computers.
Even though the QPS requires only polynomial quantum resources, it is still challenging to run on existing devices.This is because we are in the Noisy Intermeidate Scale Quantum (NISQ) [5] computing era where qubit counts, connectivities, and coherence times are limited and quantum gate and readout operations have significant noise.Therefore, there is a strong motivation to improve the polynomial scaling of the current quantum algorithms like the QPS.
In this paper, we improve the original QPS algorithm [1] by using dynamical quantum circuits: we incorporate mid-circuit qubit measurements2 and quantum gates applied dynamically based on results of classical processing on these measurements.The resulting quantum state prepared by the modified protocol is equivalent to the original.By re-setting the measured qubits to the ground state and re-using them for subsequent iterations the computational complexity in both qubits and gates can be reduced.Compared to the original algorithm, the new version uses a factor of O(N 2 ) fewer standard entangling gates, where N is the number of points used to discretize the PS.
This paper is organized as follows.Section II introduces dynamic quantum computing, in which a quantum processing unit (QPU) interacts with a classical processing unit (CPU) during computation.This is in contrast with nearly all current digital quantum algorithms, where a preset sequence of quantum gates are applied, and the Dynamic quantum computing involves dividing a program into (1) steps that can only be implemented in a quantum computer (QPU) and (2) steps that can be implemented more efficiently on a classical computer (CPU), and interfacing between the QPU and CPU wherever necessary.This scheme manifests in two categorically distinct ways.
First, one could execute an algorithm consisting of a sequence of alternating quantum and classical steps, where the result of each step is fed serially to the next.Variational algorithms such as the variational quantum eigensolver (VQE) [7] are primary examples of this scheme.To compute the smallest eigenvalue of a Hermitian operator H, VQE alternates between a quantum step that computes the expectation of H on some vector ψ( θ) , and a classical step that minimizes the expectation ψ( θ|H|ψ( θ) over parameters θ.In this scheme, each quantum step is independent from the previous, so the same QPU is reset and reused for all quantum steps.This means that the QPU's coherence time must be long enough only to accommodate the time to execute a single quantum step.
On the other hand, one could also construct an algorithm that requires rapid interfacing between QPU and CPU.For example, standard quantum error correction (QEC) procedures entail measuring a syndrome operator, followed by a recovery operation in which pre-defined quantum gates are applied conditional on measurement results [8].Therefore, incorporating QEC to some general quantum program requires frequent interfacing between QPUs and CPUs in the form of measurements (QPU to CPU) and feedback (CPU to QPU).In addition to direct feedback based on measurement results, one could also perform classical computations on them before applying quantum gates conditioned on the results of those computations.This procedure describes a fully dynamic quantum-classical computer and is the basis for the algorithm presented in this paper.Figure 1 illustrates the operation of a dynamic quantum computer.
In contrast with VQE, where quantum and classical steps alternate serially, QEC and other algorithms that use a similar rapid interfacing scheme require that quantum resources (qubits) maintain coherence during quantum-classical interfacing.Limited qubit coherence times are a major bottleneck for implementing QEC and other dynamic algorithms.However, coherence times continue to improve as hardware is developed and refined, and because of the importance of QEC for developing fault-tolerant quantum computers, dynamic hardware will be a major focus in the long term.Until now, demonstrations of dynamic computing with rapid digital interfacing include active qubit resets [9][10][11], quantum teleportation [12][13][14][15][16], and error-correction [17][18][19][20][21][22][23][24][25][26].A more complex example was recently demonstrated by IBM [27], employing a hybrid quantum-classical version of phase estimation on two qubits, where an m-bit representation of the phase is computed using several shots of a hybrid circuit that contains m measurement-feed-forward cycles.For each cycle, an R z (θ) gate is selected and applied conditionally on previous measurement results.Different measurement results select different θ values.
The hybrid phase estimation algorithm demonstrated by IBM [27] closely resembles the hybrid QPS we will introduce below in that it consists of single-qubit rotation gates with rotation angles that depend on classical information, i.e. mid-circuit measurement results.Therefore, IBM's demonstration of executing a hybrid quantum algorithm shows that our QPS algorithm can theoretically be implemented on real devices in the future.In Sec.III, we summarize the original QPS algorithm [1] before laying out the dynamical version.

III. QUANTUM PARTON SHOWER ALGORITHM
First, we summarize the quantum parton shower (QPS) algorithm originally presented in Ref. [1].

A. Physical Background
Parton shower algorithms are perturbative approaches to efficiently describe high-multiplicity final states by focusing on the soft and collinear regions of phase space.The QPS algorithm in Ref. [1] was developed for a simple quantum field theory involving two types of fermion fields, f 1 and f 2 , interacting with one scalar boson φ governed by the following Lagrangian: The first three terms in Eq. ( 1) describe the kinematic properties of the fermions and scalar while the latter three terms govern their interactions.The goal of a PS algorithm is to describe the collinear dynamics of the theory, which in this case correspond to the fermions radiating scalars (f i → f j φ) and scalars spliting into fermion pairs (φ → f i fj ).FIG. 1: Dynamic computing workflow.The essential procedure consists of four steps: (1) Measure a subset of quantum resources in the QPU, represented here by the k-qubit register, (2) On the CPU perform some processing on the measured data, (3) Reset the measured qubits to the |0 state so they can be re-used, and (4) Based on CPU outputs, apply additional quantum operations on the QPU.Note that the QPU must maintain coherence throughout this procedure.
In classical PSs, the rates of these processes are described by splitting functions: where g i ≡ g ii .The splitting functions describe the probability for a particular particle at a given step (parameterized by the scale θ) in the parton shower evolution to undergo a transformation.There are many formally equivalent definitions of the scale; here we use a common choice of the opening angle of the emission with respect to the emitter (angularly ordered shower).
In addition to the splitting functions, another important quantity is the no-branching probability (Sudakov factor): which describes the probability that no emission occurs between scales θ 1 and θ 2 .With the splitting functions and Sudakov factors, we can sample from the cross-section using a Markov Chain algorithm that generates one emission at a time, conditioned on the previous emissions.In particular, at a given step in the algorithm with a fixed number of particles, the probability that none of them radiate or split is simply a product over Sudakov factors.
If something does happen at a given step, the probabilities are proportional to the appropriate splitting function.In the limit g 12 → 0, the Markov Chain algorithm can be implemented in terms of emission probabilities, and is therefore classically efficient.However, if g 12 > 0, there are now multiple histories with unmeasured intermediate fermion types which contribute to the same final state.
To account for these interferences, we must implement the Markov Chain on at the quantum amplitude level, which necessitates keeping track of O(2 N ) different histories, where the range of opening angles is discretized into N parts.This motivates the QPS algorithm, which computes the final state radiation with g 12 > 0 using only polynomially many qubits and gates on a digital quantum computer.

B. Basis for the Quantum Algorithm
The interaction terms (Eq.( 1)) of the Lagrangian can be written as a matrix equation: Furthermore, the "interaction matrix" is real and symmetric, and can thus be diagonalized as By defining a change of basis the interactions (Eq.( 1)) become diagonal: In this "diagonal basis", splitting do not create interference between fermion types.In other words, where δ ij is Kronecker delta.This is also the case in the original basis if g 12 = 0. Therefore, to simulate interference between fermion types, we first rotate particle registers |p encoding fermion/boson fields into the diagonal basis according to Eq. ( 7), proceed with a quantum analogue of the classical Markov Chain algorithm (generating a history of angles and particle types), and lastly rotate the final particle states back to the original basis.If g 12 > 0, then the initial rotation to the diagonal basis creates a superposition of f a and f a fermions.Subsequent operations act on this superposition, and all intermediate amplitudes/histories are preserved throughout the quantum Markov Chain.This contrasts the classical MCMC parton shower, where there superpositions of multiple fermion types are not included.Note that in this model, there is no interference between histories where emissions occurred at different angles.

C. Original Quantum Algorithm
The QPS circuit uses three qubit registers to encode the particle state and history, three registers to store derived quantities about the number of particles, and a number of ancillary qubits.For N steps of the algorithm (i.e, discretizing the range of angles into N parts) and with n I initial particles, the qubit counts are provided in Table I [1] along with the number of qubits required for N steps and n I initial particles.
The QPS algorithm then proceeds by iteratively applying a series of steps.As described schematically above, each step has six components: (1) basis rotation R (m) , (2) count particles U count , (3) determine emission U (m) e , (4) update history U h , (5) update particles U (m) p , and (6) inverse basis rotation R (m) † .Using the registers described in Table I, Alg. 1 gives a high-level description of the QPS algorithm in terms of these six meta gates.An alternative description in the form of a computational protocol is provided in Alg. 1.A detailed description of each register and each quantum gate is provided in App. A. Additionally, Fig. 2 illustrates two steps of the circuit diagram for Alg. 1.Note that emission history register |h m is updated and controlled on for U p once at step m, but left alone during previous and subsequent steps.This property of the history register is a consequence of the lack of interference between different histories with the same emission angles.This motivates the idea of measuring |h m after each step, and then re-using those qubits during subsequent steps, which would reduce qubit requirements for the QPS.

D. Modified Quantum Parton Shower
Using mid-circuit operations conditioned on measurement results, we can significantly simplify the quantum circuit of the previous section.A qubit that is untouched for the remainder of a circuit can be measured at any time without changing (in principle) the properties of the encoded quantum state.In fact, a qubit used only to control operations on other qubits can be measuredand all subsequent quantum controls replaced by classical controls -without affecting the distribution of measured states.By invoking this deferred measurement principle on Fig. 2, the history sub-register |h m at step m can be Algorithm 1: Original QPS algorithm [1] Data .Therefore, in principle, the qubits of |h m can be reset to the ground state |0 and re-used for |h m+1 .Because |h m encodes a superposition of which particle |p 1 , . . ., |p n I +m−1 , or None emitted at step m, measuring |h m projects the wavefunction onto a definite emission history.In other words, the sequence of emission locations (particles) is stored classically during the circuit execution.This does not affect the dynamics of the simulation, as different emission histories do not interfere with each other -fermion superpositions within the particle register |p remain intact and only affect operations within the same history.Therefore, by running the simulation a polynomial number of times (given some constant statistical tolerance), we still construct an accurate probability distribution of final states.In this section, we describe at a high level how measuring and resetting |h after each step simplifies the QPS algorithm.A detailed treatment of the improvements is provided in App. A.
Measuring the history register at a given step tells us which particle emits at that step, so the entire emission history is available as classical information during the simulation.In fact, given that the simulation begins with a definite type (fermion or boson) of initial particle, we can  inductively infer from the emission history which type of particle emitted at each step.Note that while individual fermions may be in a superposition of different "flavors" a/b, for the two types of emissions considered -knowledge of the emitting particle type (f or φ) always implies the emitted particle type, as we consider the splittings f → f φ and φ → f f .Therefore, the total number of particles (n tot ), number of bosons (n φ ), and number of fermions (n f = n fa + n f b ) are classical information.This lets us remove two of the three counting registers |n φ , |n a , |n b , as follows.
In the original QPS, the U e and U h gates apply rotations with different angles, where each rotation (angle) is controlled on the corresponding possible value stored in |n a |n b |n φ .With mid-circuit measurements of |h , n φ is classical information, so there is no need store and control on |n φ at all.Now suppose we count the number of a-fermions -without loss of generality as we could instead count the number of b-fermions -and apply rotations controlled on each possible value stored in |n a .There is a one-to-one mapping of the possible values stored in |n a to the possible values of |n b , given by n tot = n a +n b +n φ , as n tot and n φ are stored on a CPU.Therefore, there is no need to store and control on |n b , as the superposition of possible n b 's is already implicitly encoded in |n a .Thus, by measuring |h at each step, |n b and |n φ become redundant and can be removed.
Suppose we start with Then, after measuring the history register, the CPU stores which of the initial particles emitted, so we adjust in the CPU accordingly, and the emitted particle is encoded in |p 1 .This process is repeated for each subsequent simulation step, so inductively n f,i , n φ,i , n tot,i are stored in the CPU throughout the simulation.Under the assumption that we can implement the workflow from Fig. 1, we can use this information to reduce both the computational complexity and absolute qubit/gate counts of QPS.Alg. 2 gives a high-level description of the improved QPS algorithm with mid-circuit measurements.
The second simplification is as follows.Storing φ or None in particle register is redundant, as given the emission history, the location of all φ and None particles in the simulation is stored in the CPU.Therefore, we encode particle m into a qubit sub-register |p m only if it is a fermion.To see that this can be done without significantly changing the QPS algorithm, we briefly comment on each gate that acts on |p : R, U count , U h , and U p (see Fig. 3).The basis rotation R (Eq.(A5)) acts as the identity on φ's, so simply not encoding φ's does not affect the action of R. For each particle |p m in |p , the counting gate U count applies an incrementer on |n a controlled on |p m = f a (see App. B 1).Therefore, not encoding φ's on qubit does not affect the action of U count .The U h gate applies a two-level rotation to |h for each particle in |p , where the rotation angle is controlled by particle type (see App. B 3). Recall that by measuring the history register, the CPU stores whether each particle in the emission history is a f , φ or None.Therefore, for fermions, control on |p as before, but for φ's the corresponding rotation can simply be applied without controlling on |p .In the circuit diagram of U h -Fig.13  Finally, the particle update U p must be applied only if a φ → f f emission occurs.If that is the case, then the circuit in Fig. 14   along with the number of qubits required for N steps and n I initial particles.
Note that the particle state only calls for 2(N + n I ) qubits, compared to 3(N + n I ) originally (Table.I).By only encoding fermions in quantum registers, two qubits are sufficient to encode f a , f b , fa , fb .Also note that the number of required qubits varies between different circuit executions, as a simulation where more fermions are produced requires more quantum resources.In the worst case, |p will still consist of N + n I sub-registers, while the actual number is n f .Depending on parameters g 1 , g 2 , g 12 , , and the splitting functions P i→jφ , P φ→ij , the expectation for n f may be significantly smaller than N + n I .Therefore, the maximum number of qubits required for an N -step simulation is while the actual number is where the latter could be used if a variable number of qubits can be used for the circuit execution.The asymptotic qubit scaling is just O(N + n I ), compared to O(N log 2 (N + n I )) for the original QPS algorithm (Table I).This original qubit scaling is from storing the emission history at each step, which means using N subregisters |h m each with log 2 (n I + m + 1) ∼ O(log 2 (N + n I )) qubits.By measuring, resetting, and re-using |h after each step, |h just consists of log 2 (N + n I + 1) qubits, and the dominant scaling becomes the O(N + n I ) qubits of |p .The qubit scaling difference is illustrated in Fig. 4, which plots qubit count against N , with one starting particle, n I = 1.

Gate Costs
We measure gate costs by writing operations in terms of the universal standard gate set consisting of two-qubit controleld not gates (CNOTs) and arbitary single qubit gates U (θ, φ, λ).Multi-controlled gates are decomposed into sequences of Toffoli (CCX) gates using a standard procedure that requires ancillary qubits equal to the number of controls minus one [8].Then, Toffoli gates are decomposed into six two-qubit entangling gates [8].As two-qubit entangling gates are far costlier to implement in real devices than single qubit gates, we quote gate counts in terms of CNOTs.Note that while we illustrate "classical controls" in our circuit diagrams (Fig. 14, Fig. 13) for quantum gates selected dynamically by the CPU, only the attached quantum gates are included in the gate count.
Table III summarizes the gate costs of each component of the improved QPS algorithm.The overall asymptotic scaling is This plot is simply an illustration of Eq. ( 13) and the sum over Table I with n I = 1.For n I > 1, the qubit count curves simply shift to the left, as n I and N only appear together in the qubit scaling, as N + n I .
Element Step m cost Total Scaling (N steps) TABLE III: Gate costs of the different circuit elements using re-measurement.≡ log 2 (n I + m + 1) .
which is a factor of (N + n I ) 2 more efficient than the original QPS gate scaling: This scaling improvement is due to the fact that at step m, |n a is a superposition of n I + m possible basis states, while |n a |n b |n φ is a superposition of (n I +m) 3 possible basis states.To implement U h (see App. A), rotation gates controlled on |n a is applied to |h for each possible value stored in |n a .Therefore, in the original algorithm, U h consisted of O((m + n I ) 3 ) controlled-rotations, while in the improved algorithm, only O(m + n I ) controlled rotations are applied.Fig. 5 compares the actual gate counts of our improved QPS circuits with those of the original QPS circuits.The dashed line is the contribution from just the U h gate, and Fig. 5 illustrates that the gate cost of U h is dominant.

IV. NUMERICAL RESULTS
Using Qiskit's matrix product state simulator [28,29], we are able to simulate QPS with one initial particle (n I = 1) up to several steps.For each simulation, we use a scale parameter θ = θ m , defined by splitting functions couplings and one initial f 1 (see Eq. (A2)), The couplings g 1 = 2, g 2 = 1, g 12 = 1 are arbitrary, but chosen such that g a , g b = 0 and g a = g b (Eq.( 6)), in order to capture the full problem complexity.For simplicity, the couplings are also kept independent of step for simplicity (in reality, they would run with the scale).This means that the rotations are the same at each step.The numerical values of the diagonalized couplings are and the rotation angle u is With these parameters, the Sudakov factors, which give probabilities to have no emission from a given particle at step m, can be written as Because the couplings are kept constant, these probabilities also remain constant at each step.We run simulations with g 12 = 0 in addition to g 12 = 1.As explained in Sec.III A, when g 12 = 0 the parton shower can be solved using a classical Markov Chain algorithm.Therefore, as a sanity check, we overlay analytical Markov Chain calculations, each with 10 9 shots, over simulation results with g 12 = 0 in our plots.
Figures 6 and 7 present simulation results for N = 2, 3, 4, 5 steps and compare the outputs between the original QPS and QPS with mid-circuit measurements 3 .We have chosen two different observables for illustration.
First, Fig. 6 shows histograms of the total number of emissions (E).The main subplot illustrates the probability distributions of E for classical MCMC (black), original QPS (filled bars), and QPS with mid-circuit measurements (solid edges) simulations, with both g 12 = 0 (blue) and g 12 = 1 (red).The second subplot magnifies differences between the MCMC and g 12 = 0 simulation distributions, which are due to statistical noise and exhibited the expected deviations.The third subplot magnifies differences between distributions obtained from original QPS and QPS with mid-circuit measurements, which are also within the expected statistical variations.With 10 5 shots per simulation, typical statistical errors are on the order of σ ∼ Pr(E) 10 5 1 10 5 ≈ 0.0032.Error bars shown in the second and third subplots Fig. 6 are 1σ ranges for the difference distributions, and the simulation results exhibit deviations on the expected scale.In other words, the second subplot shows that quantum simulations with coupling turned off (g 12 = 0) agree with the classical MCMC algorithm, as expected.Additionally, the two different versions of the quantum algorithm -original 3 We have stopped at 5 steps due to the simulation time.The present criteria for determining how many steps to use is that simulations with 10 5 shots have to take fewer than 3 hours running naively without any parallelization on a 8 GB RAM Mac.It would be possible to go a bit further with larger computing resources.For the remeasurement circuits, it took ∼ 2.5 hours to achieve 10 5 shots for g 12 = 0 and g 12 = 1.We note that classical conditioning is not fully implemented in Qiskit (it is not possible to do arbitrary classical calculations), so we have to apply an exponential number of classically-conditioned gates.
and with mid-circuit measurements -agree with one another.Nevertheless, the classical (g 12 = 0) and quantum (g 12 = 1) algorithms yield fundamentally different results.We briefly describe the qualitative features of Fig. 6.First, with the chosen parameters (Eqs.( 17), (19) and ( 20)), it turns out that Pr(E = 0, g 12 = 0) > Pr(E = 0, g 12 = 1) . ( Additionally, for g 12 = 1, the probability of φ-emission is 1 − 7/4πN , compared to 1 − 5/4πN for g 12 = 0. Therefore, conditional on a φ particle being present in the system, φ-emissions occur more frequently when g 12 = 1.This explains why the probability of having just one emission for g 12 = 1 is so low compared to g 12 = 0. Finally, the exact shape of E distributions depends on numerical parameter values, but the general shape exhibited in Fig. 6 -increasing density with increasing E up to a peak (which could be E = N ), followed by a tail where density decreases as E → N -is expected to hold for all N and all parameter values.The probability of emission at a given step is higher when there are more particles in the system, which explains why having just one or two emissions is less probable compared to having several emissions.However at the tail end (E → N ) of this trend, the distribution decreases slightly, because combinatorially there are more histories with E = N − 1 then with E = N .The second observable, Fig. 7, is the distribution of the "hardest" emission angle, which algorithmically corresponds to the first emission that occurred during the shower evolution.The emission probability decreases exponentially with log(θ), or linearly with opening angle θ.Algorithmically, this is because the probability of first f i → f i φ emission occuring at smaller angles (later steps) is just an exponential of the Sudakov factor (Eq. ( 27)).For g 12 = 0, Pr(First emission at step m) and for g 12 = 1, Fig. 7 shows the distribution of log e (θ max ), with probabilities displayed in the main subplot and differences displayed in the secondary subplots.For this observable there is again a demonstrated difference in results between the classical (g 12 = 0) and quantum (g 12 = 1) algorithms.Nevertheless, the third subplot illustrates that the two versions of QPS -original and with mid-circuit measurements -agree within expected statistical variations.

V. CONCLUSIONS AND OUTLOOK
In this paper, we have simplified the digital Quantum Parton Shower (QPS) algorithm presented in [1] by considering mid-circuit measurements and quantum gates that are dynamically selected based on these measurement results.The QPS is an iterative "quantum Markov Chain" algorithm with N steps, and by making a mid-circuit measurement on a subset of qubits at each step, subsequent multi-qubit controlled R y rotations in the original QPS can be replaced by dynamically selected single-qubit R y rotations.In this case, the number of required R y rotation at each step is reduced by a factor of N 2 .Additionally, qubits measured midcircuit can be reset to the initial |0 state and re-used during subsequent steps, which reduces the qubit costs significantly.The resulting Alg. 2 improves the quantum gate complexity from 5), and the qubit complexity from O((N + n I ) log(N + n I )) to O(N + n I ) (Fig. 4), compared to the original algorithm (Alg.1).
We implement our quantum circuits using Qiskit (where dynamical quantum operations is a relatively new feature), and present results for N = 2, 3, 4, 5 steps.We illustrate agreement between the original and improved versions of QPS, as well as agreement with classical MCMC simulations in the limit g 12 = 0, where QPS can be efficiently computed classically.Errors are shown to be consistent with the expected statistical uncertainties in all cases.
More generally, we showed how adopting a hybrid quantum-classical computing platform can be used to make an originally quantum algorithm more efficient.Recent studies [27,30] have demonstrated that dynamic/hybrid quantum computing is feasible, and even implemented shallow algorithm on current hardware.As qubit design continues to improve, we expect to be able to execute more complicated hybrid algorithms such as QPS on real devices, and eventually be able to compute classically inaccessible physical observables.Moreover, as dynamic quantum computing is an intrinsic component of quantum error correction, developing dynamic/hybrid computing platforms is likely necessary in order to realize fault-tolerant quantum computers.The improved QPS algorithm serves as an additional case for prioritizing development of dynamic computing platforms, as reduced qubit and gate complexities/costs raise the potential for realizing QPS to compute classically inaccessible physical quantities much sooner.It is likely that other digital quantum algorithms with similar features -Markov Chain, or iterative algorithms where interferences exist within but not between different histories -can benefit from employing a dynamic/hybrid structure, and we encourage algorithm developers to consider this approach.This appendix provides details on the original QPS algorithm presented in [1].

a. Particle state |p
This register consists of N + n I 3-qubit sub-registers, one for each initial particle, and one for each emission step, We use three qubits to encode each particle, as there are six different types of particles (f 1 , f 2 , f1 , f2 , φ, and None) in our model.The exact encoding is relevant, and we use the following, Therefore, operations conditioned on whether a particle is a fermion are controlled by just the first qubit, and operations conditioned on whether a fermion is type-a or type-b are controlled by just the first and third qubits.Note that two computational basis states are extraneous.

b. Emission history |h
In original QPS, |h encodes the location of emission at each step.In particular, at step m, |h m stores a binary number between 0 and n I + m that specifies which particle emitted at that step.The |0 state encodes that no emission occured.Therefore, each sub-register |h m consists of log 2 (n I + m + 1) qubits, and in total |h consists of

c. Emission |e
The emission register |e stores a boolean that specifies whether an emission occurred at a given step.It is straightforward to uncompute |e after each step, so just one qubit is sufficient to represent |e .The number registers are used to count the number of each particle type at each step.In particular, at step m each of |n φ , |n a , |n b stores a binary number between 0 and n I + m, the maximum possible number of particles at step m.Also note that the total Each number register is uncomputed during each step and can be reused for subsequent steps.Therefore, for an N -step algorithm, each number register consists of log 2 (n I + N ) qubits, for a total of 3 log 2 (n I + N ) qubits between |n φ , |n a , and |n b .
Having set up the six quantum registers shown in Fig. 2, we now describe each gate in the circuit.
e. R (m) basis rotation As described in Sec.III B, we rotate fermion states from the 1/2 basis into the a/b basis by applying unitary U (Eq. (A5)).Given our particle state representation A2, rotating a single particle represented by three qubits entails applying the following unitary gate: where I and U are 2 × 2 matrices.Therefore, to rotate to complete particle register |p at step m, apply the product gate The gate in Eq. (A5) is just a controlled-U gate, where U is applied to the rightmost qubit controlled on the leftmost qubit, in the particle encoding (Eq.(A2)).Therefore, applying Eq. (A6) at the beginning and end of step m involves applying (n I + m) controlled-U gates, each of which can be decomposed into two CNOTs.
f. Ucount particle counting The counting gate maps the particle state |p at step m to the number of each particle, which is stored in |n φ , |n a , |n b .Note that we count fermions and antifermions of the same type together, as this distinction does not affect emission probabilities.For each particle |p i in the particle state |p , we apply an increment gate + controlled on particle type to each of |n φ , |n a , |n b , as illustrated in Fig. 8 The increment gate controlled on φ, f a , and f b implements respective transformations Given that an emission occurred, the conditional probability that particle p is the emitter is (A19) Therefore, U h prepares the computational basis states of |h with the following amplitude distribution: As the rotation angle depends on counts |n a , |n b , |n φ , rotations are controlled on the count registers, in addition to |e and |p .Iterating through each particle p 1 . . .p n I +m , each of these rotations is followed by a decrement to |n a , |n b , or |n φ controlled on that particle type.This means the relative emission probabilities given by Eq. (A19) are updated after each rotation.(Note that the denominator in the entries of Eq. (A21) is different for each rotation.)Figure 10 illustrates a single rotation-decrement iterate, and Fig. 11 illustrates the entire U h gate.The gate complexity of U h is O({m, n I } 4 log 2 (m+n I ) 2 ), and 3 log 2 (n I + m + 1) + 2 re-usable ancilla qubits are required.

i. Up particle update
The U p gate updates the particle state |p based on which particle emitted at a given step.At step m, |p m stores the newly radiated particle if any.For example, if particle i emits a φ at step m, then U p takes   We need only count the number of a single fermion type, so both gate and qubit counts are reduced by a factor of 3, compared to [1].
The total number of CNOTs required to implement U count at step m is 13(n f,m ) log 2 (n f,m + 1) ≤ 13(n I + m) log 2 (n I + m + 1) . (B1)

Ue gate
Instead of conditioning on |n a , |n b , and |n φ , we only condition on |n a .We compute Sudakov factors [1] for each possible value of |n a , and apply the appropriate rotation matrices [1] conditioned on the value stored in |n a .At the mth simulation step, 0 ≤ n a ≤ n I +m, so there are at most n I + m conditional rotations, each conditioned on log 2 (n I + m + 1) qubits.In the original algorithm, there are O(m 3 ) rotations, which is the number of combinations of |n a , |n b , |n φ .Here n φ and n b are conditioned on classically, so the number of conditional rotations is reduced by a factor of m 2 .Thus, the computational complexity of the U e gate is reduced to O(m log 2 n I +m+1 ), compared to O(m 3 log 2 n I + m + 1 ) for the original [1].

U h gate
Like the U e gate, the rotations in this gate must be conditioned only the value stored in |n a .Thus, the computational complexity of the U h gate is reduced to O(m 2 log 2 2 n I + m + 1 ), compared to O(m 4 log 2 2 n I + m + 1 ) for the original [1].The improved U h gate is illustrated in Fig. 13 These controls are denoted by circular "gates" in the diagram.The relative probabilities for an emission from |p j depend on whether |p j is an f a , f b , or φ, as well as the number of each particle type in the system (which is fully encoded by |n a and previous measurements of |h ).Therefore, we must apply 3 log

Up gate
Measuring the history register at step m collapses the wavefunction of the system such that a particular particle emitted or there was no emission.There is still quantum interference between the different fermion *types/flavors*, as measuring |h does not affect any superpositions within particle states |p .
The selection of which U p gate to apply is determined dynamically based on the measurement result.If the emitting particle is a φ, then apply the following particle update: If the emitting particle is a fermion, then the particle update consists of entirely classical operations, as φ and None are not encoded on qubits.In our particle history table Table IV, we record that emitting particle j remains a fermion, and particle m is a φ.The computational cost of the U p gate is reduced to a maximum constant of 2 |p m U r H |p j ] FIG.14: U p gate to be applied if p j = φ.Note that prior to the emission, p j is not encoded on qubits, so the particle update can be though of as "initializing" new registers |p j and |p m .
CNOTs, compared to (n I + m) log 2 n I + m + 1 for the original [1]. Step

FIG. 4 :
FIG.4: Qubit cost comparison between the original QPS and the improved version with mid-circuit measurements.This plot is simply an illustration of Eq. (13) and the sum over TableIwith n I = 1.For n I > 1, the qubit count curves simply shift to the left, as n I and N only appear together in the qubit scaling, as N + n I .

FIG. 5 :
FIG.5: Gate cost comparison: The dashed line represents the dominant contribution from U h to the total gate count for QPS with mid-circuit measurements.

FIG. 6 :FIG. 7 :
FIG. 6: Probability vs. Number of emissions (E) for 2, 3, 4, and 5 step simulations.Error bars represent 1σ ranges, e.g., in each third subplot, the red error bars corresponds to the standard deviation of the difference distribution between simulation results obtained from original QPS and QPS with remeasurement.Classical MCMC data were obtained using 10 7 shots, so the statistical errors are suppressed by a factor of 1/10, and are thus negligible.Error bars in each subplot represent statistical errors from g 12 = 0 simulations.

Fig. 12 is the gate
Fig. 12 is the gate 2 (n I + m + 1) different two-level rotations between |h = 0 and |j , each conditional on the values stored in |n a and |p j .Applying a controlled-decrement after each U (m,j) h ensures that the correct relative emission probabilities are encoded into |h (see Appendix A.5 in [1]), and also resets |n a to the |0 state.Finally, the last gate puts |e back to the |0 state -after updating the history register, |h = |0 ⇐⇒ |e = |1 , so we apply a NOT gate conditional on |h = |0 . .

TABLE I :
Registers in the QPS quantum circuit

:
Splitting functions P i→jφ , P φ→ij , couplings g1, g2, g12, step parameter , number of steps N , and nI initial particles.Result: Full amplitude description of final state radiation.begin Initialize all qubit registers in the |0 state.Encode initial particles |p 1 ... |p n I .Count particles: Using Ucount (App.A 0 f), count the number of each particle type, storing the results in |na , |n b , |n φ .(3) Determine emission: Using Ue (App.A 0 g), encode whether an emission occurred this step on |e , where the probability of emission is controlled on |na , |n b , |n φ .(4) Update history: Using U h (App.A 0 h), update the history register |h for j ← 0 to N − 1 do (1) Basis rotation: Rotate all particles in |p to the diagonal basis (Eq.(6)) using Eq.(A5).(2) m , which encodes which particle (if any) emitted this step.The relative amplitudes for particular emissions are controlled by |na , |n b , |n φ , |p , |e .(Note that |e is put back into the |0 state implicitly in U h .)(5) Update particles: Using Up (App.A 0 i), update the particle state, controlled on which particle emitted (encoded in |hm ).(6) Inverse basis rotation: Rotate all particles in |p back to the original basis, using the inverse of Eq. (A5).Measure all qubits.measured directly after applying U the first two steps of the QPS algorithm.Round gates indicate control qubits.
High-level circuit diagram of the first two steps of the improved QPS algorithm.Round gates indicate control qubits.Double wires indicate classical information stored on a CPU, and measured from |h .Double wires attached to U p indicate that sub-gates in U p are selectively applied based on measured values of h 0 , h 1 , ....The |0 attached to the same gate indicates a reset to the |0 state.
-this can be visualized by replacing quantum register |p i with classical wires if |p i = |φ .The gate sequence of Fig. 13 is unchanged, except that some rotations U Splitting functions P i→jφ , P φ→ij , couplings g1, g2, g12, step parameter , number of steps N , and nI initial particles.Result: Full amplitude description of final state radiation.begin Initialize all qubit registers in the |0 state.Encode initial particles |p 1 ... |p n I .for j ← 0 to N − 1 do (1) Rotate all particles in |p to the diagonal basis (Eq.(6)) using Eq.(A5).(2) Using Ucount (App.A 0 f), count the number of a-fermions, storing the result in |na .(3) Using Ue (App.A 0 g), encode whether an emission occurred this step on |e , where the probability of emission is controlled on |na , and n φ , ntot.from the CPU.(4) Using U h (App.A 0 h), update the history register |h m , which encodes which particle (if any) emitted this step.The relative amplitudes for particular emissions are controlled by |na , |e , and n φ , ntot.from the CPU.(Note that |e is put back into the |0 state implicitly in U h .)(5) Measure the history register |h , storing the result on the CPU.(6) If the measurement result indicates that a φ → f f emission occurred, apply Up (Fig. 14) to update the particle state.(7) Reset |h to the |0 state.(8) Rotate all particles in |p back to the original basis, using the inverse of Eq. (A5).Measure |p .
(m,i) h may no longer be controlled on |p i , but applied directly to |h .Data: is applied to |p .Figure. 3 illustrates the high-level circuit diagram for two steps of the improved QPS algorithm.Note the reduction in registers compared to the original QPS.The improved QPS circuit calls for four qubit registers, detailed in Table II below.

TABLE II :
Registers in the improved QPS algorithm

)
Starting with |h in the |0 state, this distribution is prepared by applying a series of two-level rotations from |0 to the other computational basis states of |h .Each rotation is controlled on |e , and applies the conditional emission probability of emission for each particle |p 1 . . .|p n I +m .The rotation controlled on |p j is U FIG. 12: U p consists of applying this gate, controlled on |h m = j , for each j from 1 to n I + m.
)In total the particle update step consists of (n I + m) applications of U p controlled on log 2 (n I + m + 1) qubits.The U p gate has a constant number of operations, so the overall gate complexity of the particle update is O({m, n I } log 2 (n I + m + 1)).
. The individual U FIG. 13: The U h gate at step m.Each U (m,j) h gate denotes a sequence of two-level rotations (see App.A 0 h) controlled on the different states stored in |n a and |p j .

TABLE IV :
Information stored in a CPU.At each step (m) the history measurement determines whether particle m + 1 will be a fermion, boson, or None.This "particle history table" is filled out row by row as QPS is iterated through, and quantum gate are dynamically selected based on the values stored here.