Thermodynamic optimization of quantum algorithms: On-the-go erasure of qubit registers

We consider two bottlenecks in quantum computing: limited memory size and noise caused by heat dissipation. Trying to optimize both, we investigate"on-the-go erasure"of quantum registers that are no longer needed for a given algorithm: freeing up auxiliary qubits as they stop being useful would facilitate the parallelization of computations. We study the minimal thermodynamic cost of erasure in these scenarios, applying results on the Landauer erasure of entangled quantum registers. For the class of algorithms solving the Abelian hidden subgroup problem, we find optimal on-the-go erasure protocols. We conclude that there is a trade-off: if we have enough partial information about a problem to build efficient on-the-go erasure, we can use it to instead simplify the algorithm, so that fewer qubits are needed to run the computation in the first place. We provide explicit protocols for these two approaches.

We consider two bottlenecks in quantum computing: limited memory size and noise caused by heat dissipation. Trying to optimize both, we investigate "on-the-go erasure" of quantum registers that are no longer needed for a given algorithm: freeing up auxiliary qubits as they stop being useful would facilitate the parallelization of computations. We study the minimal thermodynamic cost of erasure in these scenarios, applying results on the Landauer erasure of entangled quantum registers. For the class of algorithms solving the Abelian hidden subgroup problem, we find optimal on-the-go erasure protocols. We conclude that there is a trade-off: if we have enough partial information about a problem to build efficient on-the-go erasure, we can use it to instead simplify the algorithm, so that fewer qubits are needed to run the computation in the first place. We provide explicit protocols for these two approaches.

I. INTRODUCTION
When is the best time to reset qubit registers? A default option is to run a whole algorithm and reset all registers to |0 at the end, after the final measurements. However, if the total number of qubits is a limitation and we need to run several algorithms concurrently, we may want to free up some registers as they stop being useful: for example, in the period-finding algorithm, the auxiliary register can be discarded after applying the oracle (Figure 1a). Another critical factor may be heat dissipation: Landauer's principle tells us that the erasure of every single qubit from a fully mixed state to |0 has a fundamental work cost of k B T ln 2 if performed at temperature T , releasing the same amount of heat to the environment [1]. As heat dissipation in a quantum computer threatens coherence, reducing the work cost of erasure may be of critical importance.
Previous erasure schemes. We consider algorithms that use a main register of n qubits and an auxiliary register of m qubits ( Figure 1a); the latter can be discarded at some halfway point in the algorithm. For example, the period-finding algorithm is of this form. To optimize memory space, we may want to erase it as soon as possible: a brute-force erasure procedure of those m qubits (Figure 1b) would dissipate heat m k B T ln 2 with a simple fixed map, independent of the algorithm. On the other extreme, if we only want to optimize the heat cost, we can apply Bennett's reversible erasure procedure [2][3][4], which coherently copies the output register to an external system, and then uncomputes the algorithm's circuit on the original qubits reversibly (Figure 1c). The main drawback of this procedure emerges when it is applied to probabilistic quantum algorithms like periodfinding: Bennett's uncomputing only works when the * florian.meier@tuwien.ac.at † lidia@phys.ethz.ch output register is decoupled from the rest of the quantum computer -in other words, when the algorithm outputs a deterministic result [5]. Probabilistic quantum algorithms are made (approximately) deterministic by repeating them many times, and applying classical postprocessing to the probabilistic outputs, including for example a majority vote. To apply Bennett's uncomputing to a probabilistic algorithm like period-finding, we would have to implement all these runs of the algorithm and the (usually classical) post-processing as a large reversible quantum circuit, so that the final post-processed quantum output is approximately decoupled from the rest of the memory. 1 This process has a large complexity cost both in terms of memory size and circuit length; while it may be worth pursuing in a distant future when our computers work flawlessly and reversibly at the quantum level, in this work we focus on a NISQ regime, and try to optimize both thermodynamic and computational complexity costs of algorithms. Thermodynamic considerations. We will work in the quantum resource theory of thermal operations [7][8][9]. In this framework, unitary operations on degenerate systems are given for free, and irreversible operations like erasure have associated work costs. Contemporary quantum computers are of course still far from this ideal scenario; nonetheless, the fundamental limits for the energy cost of implementing single-qubit unitaries are comparable to that of erasure [10]. Moreover, note that the energy requirements to implement common unitary operations (which depend on the quantum control mechanisms) scale sublinearly on the number of qubits, while erasure scales linearly [10]. This, together with recent erasure experiments that approach Landauer's limit [11][12][13], have us speculate that energies of the order k B T may eventually become relevant to quantum computing. Overall, thermodynamic optimization of quantum com-  We consider quantum algorithms where a main register is used until the final measurement, but there can be auxiliary registers, which are only needed for part of the algorithm -for example, the period-finding algorithm and more generally algorithms for hidden subgroup problems are of this form. (1b) In order to free up memory space, the auxiliary register can be erased on-the-go. A brute-force erasure at temperature T will have work cost kBT ln 2 per qubit due to Landauer's principle.
(1c) For Bennett's uncomputing, the original probabilistic algorithm is made essentially deterministic by running many copies of (V ⊗ 1) • U in parallel together with a quantum implementation of the classical post-processing, summarized as U. The result of this calculation is coherently copied to an output register and U is uncomputed. (1d) We propose an optimized on-the-go erasure scheme which takes advantage of the entanglement between main and auxiliary register to reduce the work cost of erasure of the latter.
putation entails at least three independent components: (1) cost of unitary gates, (2) cost of erasure of fully mixed qubits, (3) optimizing number of fully mixed qubits that must be erased (see [14][15][16][17] for reviews on the thermodynamics of quantum computation). Our work addresses the third component, and can be applied in conjunction with restrictions or improvements on the former two. This is further discussed in Section IV.

A. Contribution of this paper
Optimized on-the-go erasure. Making use of entanglement between the main and auxiliary register as a thermodynamic resource [7-9, 14, 18, 19], we introduce a new erasure scheme (Figure 1d). It entails a strictly lower heat dissipation than brute-force erasure; in contrast to Bennett's uncomputing, the auxiliary register is reset on-the-go without needing additional qubits. However, these improvements do not come for free: the main cost of our scheme will arise from the information to access the entanglement.
List of results. In the setting of the Abelian hidden subgroup problem, we use partial information about entanglement to optimize the erasure of auxiliary registers. In particular: 1. We find that optimal erasure (where all the entanglement between registers is exploited) is only possible if we already know the solution to the problem, i.e. the hidden subgroup (Theorem 5).

2.
Given partial information about the problem, we provide an optimal on-the-go erasure protocol of auxiliary registers and compute its work cost (Theorem 8).
3. As an alternative to erasure, we can use that same partial information to simplify the algorithm, so that it uses fewer qubits (Theorem 9). We provide explicit protocols for the cases of black-box oracles and open circuit access to oracles ( Figure 4 and 5).

4.
There is a precise trade-off between the thermodynamic cost of erasure and algorithm simplification. The optimal choice of implementation (in terms of computational complexity) depends on the oracle: if we have open circuit access to the oracle, it is more efficient to simplify the circuit; if the oracle is given as a black box it is roughly equivalent to perform on-the-go erasure or to simplify the circuit.
Structure. In Section II we review the mathematical tools and notions of quantum thermodynamics along with the algorithm solving the Abelian hidden subgroup problem. These are the main ingredients on which our results are based which will be shown in Section III. By the example of the period finding algorithm in Section III C 1, we illustrate the key concepts of our optimized on-the-go erasure scheme and in Section III A, we generalize the example to the Abelian hidden subgroup problem. There, we state the main theorems 3 -9 together with a qualitative sketch of the proofs. Discussions and open questions can be found in Section IV. The full proofs of the main theorems and further generalizations are explored in the appendix: in appendix A an explicit erasure protocol [20] is reviewed, and appendices B and C contain the proofs for our results.

II. SETTING & BUILDING BLOCKS
In this section we briefly review the results obtained in [18] regarding optimal bounds for the thermodynamic costs of erasing a memory with quantum side information -this will be useful as a building block for our erasure schemes. Then we recall the algorithm solving the Abelian hidden subgroup problem, and lastly, we devise a strategy for how to optimize the erasure of the auxiliary register of said algorithm.

A. Erasure with quantum side information
Work cost of erasure. Landauer's principle [1,21] demonstrates the intricate relation between information theory and thermodynamics. It states that logically irreversible operations come with an intrinsic work cost, related to the temperature of the environment where the computation is carried out. If we are looking at a system S initially in a state ρ S , the average work cost of erasing this system at a temperature T (that is setting ρ S → |0 0| S using a thermal bath at temperature T ) scales with the entropy of the initial state, where k B is the Boltzmann constant and H(ρ) = − Tr(ρ log 2 ρ) is the von-Neumann entropy [14]. In the setting of Figure 1b, being ignorant about the state of the m-qubit auxiliary system, one has to apply a fixed erasure map and not the optimal map designed for the actual state ρ S . The average work cost of this map corresponds to the worst-case scenario of erasing a fully mixed state ρ S = (1/2) ⊗m , that is mk B T ln 2 for erasure at temperature T . This energy is then dissipated into the rest of the quantum computer, causing it to heat up, which may increase noise and decoherence. Using side information, available as entanglement between the main and auxiliary registers, we attempt to improve this work cost by using the following result.
Lemma 1 (Erasure with quantum side information [18]). Given two degenerate quantum registers G and S and any reference system R, then there exists a process E acting on G, S and an environment at temperature T that erases S while preserving G and R, that is, which does not exceed an average work cost (and heat dissipation) of with H(S|G) ρ = H(GS) ρ − H(G) ρ the conditional von-Neumann entropy of S conditioned on G. This procedure is reversible on GS: there exists a process that achieves the transformation ρ G ⊗|0 0| S → ρ GS for the symmetric work cost −W (S|G) ρ .
The key insight of Lemma 1 is that quantum correlations (and in particular entanglement) can be used as additional resources to reduce the work cost of erasure of the auxiliary system. The average work cost is meant with respect to the thermodynamic limit of many independent copies of the systems GS (for a brief discussion of single-shot and finite-size effects, see Section IV). In the following G will be the main register and S will be the auxiliary register 2 . The reference system R includes the non-accessible degrees of freedom that may be correlated with our quantum registers, e.g. the rest of the quantum computer.
Example: erasure of half of a Bell pair [18]. This is the simplest application of Lemma 1, which will be useful to understand the general procedure later. Take system S to be a single qubit with Hilbert space H S = C 2 and G to be two qubits with Suppose that initially, G 2 and S are entangled, where |χ = (|00 + |11 )/ √ 2 is a fully entangled Bell state. The goal is to erase S while preserving G -that is, the final state should be ρ G ⊗|0 Abelian hidden subgroup algorithm [22,23]. The quantum circuit above solves the Abelian hidden subgroup problem. Since it is of the same form as Figure 1d, we can use it as a candidate for optimizing the on-the-go erasure. The main register HG encodes the group G and the auxiliary register HS encodes S. The function oracle acts on states of the joint register via O f |g, s = |g, s ⊕ f (g) , with ⊕ denoting the bitwise XOR operation. The algorithm performs the following sequence: At the end, the total average work cost of erasure for this toy example is in accordance with Eq. (3).

B. Hidden subgroup problem
Several computational problems can be phrased in terms of the hidden subgroup problem (HSP) [24], most famously period finding, which finds its application in Shor's integer factorization algorithm, and the discrete logarithm problem [25]. We will first state the general problem and how our erasure algorithm applies, before looking at those particular instances.
Problem 2 (Hidden Subgroup Problem [22]). Let G be a finite group, S some finite set and f : G → S a function. Given the existence of a subgroup H ⊆ G such that for all g, g ∈ G the goal is to determine H.
The HSP can be solved by an efficient 3 quantum algorithm originally found by [26], under the assumption that The erasure process acting on an initial state ρGS must leave the reduced state of the main register invariant, i.e. we require TrS(ρGS) = ρ G (the violet box in the above Circuit must act locally as the identity on G). This requirement is necessary to ensure that our on-the-go erasure procedure does not affect the outcome of the algorithm to which it is applied.
the group G is Abelian (the group operation is commutative). We will from now on be using the addition symbol + for group operations in G to highlight its Abelian property, that is g + h instead of gh. Unless stated otherwise, whenever we refer to the HSP, the Abelian HSP is meant. For the general non-Abelian HSP, there are algorithms efficient in terms of oracle complexity [23,27], but to the authors' knowledge, no general algorithm exists that is efficient in gate complexity. Here we follow [22,23] for the quantum algorithm solving the HSP ( Figure 2). In Appendix B 1 the computational steps are derived and explained in detail. At this point, the key observation we make is that the circuit solving the HSP is precisely of the form as required, e.g. the circuit in Figure 1a. After a unitary U = O f •(Q G ⊗ 1) operation on main and auxiliary register the latter is no longer needed and can be erased by using a Landauer erasureẼ. The computation on the main register can be continued independently.
C. Strategy towards on-the-go erasure So far we have identified the point at which we optimize the erasure of the auxiliary register: Right after these qubits are not needed anymore but before the computation on the main register is finished. The global unitary U from Figure 1d corresponds to the composition O f • (Q G ⊗ 1) = U from Figure 2. By ρ GS we denote the state of GS right after U . To apply the result from Lemma 1 we have to determine where in ρ GS the entanglement between G and S is. Operationally, this means we need to find local operations U G and U S on the main and auxiliary register respectively such that the entanglement between these registers is compressed in welldefined qubits, for example Bell pairs |χ , In our erasure algorithm, the entanglement is always compressed into fully entangled pairs of qubits. 4 In Section III A we establish bounds on the number of Bell pairs for which the transformation in Eq. (11) can be achieved. After an optimized erasure of S according to Lemma 1, the reduced state of the main register is Before one can continue the computation with V = Q G , the local transformation U G has to be undone, where the reduced state on G is unaffected by the erasure ρ G = Tr S (ρ GS ) ( Figure 3). This ensures that the algorithm still produces the same outcome, regardless of the manipulations due to the on-the-go erasure. An important question we will answer in the next section is about the costs of the unitaries U G and U S . While in the thermal operations resource theory they are for free, we have to quantify their cost from a computational standpoint.

III. RESULTS
Here, we introduce the optimized on-the-go erasure protocols, starting with general bounds for the HSP. Then, we will define a class of modifications realizing these optimizations whose costs we will quantify in terms of the algorithm's width. This section is concluded by a toy example for the period finding algorithm which is a special case of the HSP.
4 Alternatively, one could weaken this assumption and consider partially mixed qubits, for which the relative entropy is greater, and therefore (by Lemma 1), the work cost of erasure is lower.
To be applied optimally, this may require more fine-tuned control of the physical interface in the "information battery" part of the quantum computer (see Appendix A 2).
where ρ GS is the state of the computational registers after the oracle operation O f .
The formal proof of this statement is outsourced to Appendix B 3. Here we sketch it: The key insight is to quantify the entanglement between G and S using the conditional von-Neumann entropy [28][29][30]. As the function f from the HSP is constant on cosets g + H ∈ G/H, the only entanglement that is generated by the function oracle O f comes from a sum over the different cosets in G/H. Each coset [g] ∈ G/H contributes to the entanglement by terms of the form (2) . They originate the state right after the function oracle The sum over the cosets can be factored out via a local transformation, given by a choice of representative for each coset, that is This results in a contribution of max = log 2 |G/H| Bell pairs. The remaining terms in the sum are not entangled.
Along the same lines we show that such a factorization can indeed be realized by unitary operations.
Lemma 4 (Existence of transformations saturating the bound). There exist local unitaries U G and U S which saturate the upper bound max of Bell pairs which can be factored from the state after the function oracle O f .
There is a caveat to the transformations U G and U S saturating this upper bound as in Theorem 3. In fact, finding the transformations must be at least as difficult as solving the problem for which we run the algorithm in the first place.
Theorem 5 (No-go for saturating the bound). Any onthe-go erasure protocol applying local unitaries U G and U S to factorize the maximum amount max of Bell pairs from Theorem 3 can be used to solve the HSP.
The underlying reason is that the transformation U G required for this factorizes the main register H G into parts belonging to H and G/H, Essentially this means we have operational access to the elements of H ⊆ G via the inverse operation U † G . This hints at a relation between the number of Bell pairs we can factorize and the amount of information we have about the solution of our problem. In a next step we explore how Theorem 5 generalizes to instances where max is not reached. What type of partial information is required to factor ≤ max Bell pairs, and how do we quantify it?
B. On-the-go erasure and limits with partial information Partial information. In a first step we characterize the partial information we need to know about the indicator function f : G → S such that we are able to factor ≤ max Bell pairs after the function oracle O f . We start with the promise of knowing where Bell pairs are, that is, we have access to transformations U G and U S on H G and H S which factor out Bell pairs after the function oracle. The Bell pairs we consider are fully correlated qubits which tells us that the oracle O f maps some part of H G one-to-one on H S . Formally, this corresponds to a factorization with O f fully correlating the spaces H Promise 6 (General partial information characterization, informal version). We need to know a factorization . Moreover, f must map G (2) one-to-one on S (2) .
A formalized version of this promise is given in Appendix B 3, Definition 18 and Theorem 19, together with a proof that Promise 6 is sufficient and necessary for factoring Bell pairs. For the ease of presentation, we will present here a subclass of partial information which respects the group structure of G. Partial information of this type can be understood as narrowing down the search for the subgroup H ⊆ G to a search for H ⊆ K with partial information about the function oracle (see Appendix B 3 for detailed prescriptions of the transformations). In particular, the specific form in Eq. (19) allows factoring out Bell pairs as outlined in our strategy (Section II C) in Eq. (11).
Promise 7 (Partial subgroup information). We assume to have access to partial information about the indicator function f : G → S. That is, we know 1. an intermediate subgroup K between H and G (H ⊆ K ⊆ G) which operationally means to have access to a unitary operation U G which factors the main register according to 2. where f maps G/K in S; operationally that means having access to a unitary U S such that Optimized on-the-go erasure with partial information. The circuit in Figure 4 implements the modifications due to the transformations U G and U S from Promise 7. This brings a reduction of the work cost of erasure which we quantify in Theorem 8.
Theorem 8 (Work cost of erasure with partial information). Given the transformations U G and U S from Promise 7, there exists an on-the-go erasure protocol acting on G, S and an environment at temperature T , resetting the auxiliary register S after O f while preserving G which does not exceed an average work cost of erasure of where = log 2 |G|/|K| and m = log 2 |S| is the number of qubits of the auxiliary register.
This result also generalizes to partial information from Promise 6. The only change in the circuit of Figure 4 is that the transformations U G and U S have to be replaced by their generalized versions. In Appendix B 3, Theorem 20 generalizes Theorem 8. For a proof, the reader is referred there.
Oracle simplification with partial information. In Section III B we derived a no-go result (Theorem 5) for the factorization H G → H H ⊗ H G/H by observing that finding such a factorization is as difficult as finding the hidden subgroup H ⊆ G itself. With the newly introduced partial information erasure (Promise 7), how do we now quantify the difficulty of finding the transformations U G and U S ? Put differently: What is the operational significance of the partial information required for the transformations U G and U S ? The following result answers this question.
Theorem 9 (Partial information correspondence). The unitaries U G and U S from Promise 7 can be used to formally construct a new function oracleÕ f which requires 2 fewer qubits than O f ( = log 2 |G|/|K|). Moreover, this modified oracleÕ f can still be used to solve the HSP.
With the modified oracleÕ f the HSP algorithm can be run on 2 fewer qubits than with O f . Both the main and auxiliary qubits can be reduced by and in comparison to the circuit in Figure 4, the quantum Fourier transform on the main register is now implemented for the group K instead of G. Details for the proof of Theorem 9 are in Appendix C. The two constructions we made are thermodynamically equivalent: For the circuit in Figure 4 we have an average work cost of erasure equal (m − 2 )k B T ln 2 due to an erasure of auxiliary qubits which were fully entangled to the main register. In the simplified algorithm using modified oracle from Figure 5, 2 fewer qubits are required to run, hence, the average work cost of erasure is also (m − 2 )k B T ln 2. The two constructions also produce the same computational output; in Appendix C it is shown that the construction for Theorem 9, given in the circuit of Figure 5 is sufficient for optimized on-the-go erasure The above circuit is a modified version of Figure 2 implementing the optimized on-the-go erasure of qubits (see shaded part of the diagram). Here we know unitaries UG and US which factor out part of the entanglement between the main and auxiliary register in the form of Bell pairs. The qubits belonging to the auxiliary register are then erased at temperature T withẼ at a total average work cost of − kBT ln 2. The steps of the modified algorithm displayed above that are unchanged from the original are grayed out.
2b. Side information erasure E of qubits, standard erasureẼ of remaining m − qubits. finding the hidden subgroup H. The simplification due to the modified oracleÕ f (see Figure 5) can be categorized in two ways: 1. O f is given as a black box: The simplificationÕ f is a formal construction of an existence result.
2. O f is given with open circuit access: The transformations U G and U S can be incorporated into the oracle O f , and the new oracle requires 2 fewer physical qubits.
C. Special cases of the hidden subgroup problem

Toy example with period finding
In this simple example, the on-the-go erasure protocol is straightforward. The period finding algorithm (PFA) is concerned with the following problem, which is a special case of the HSP: Problem 10 (Period finding problem). Given a function f : Z N → Z M which is r-periodic and injective on each period of length r, the goal is to find r.
The quantum algorithm solving this problem is of the same form as the HSP algorithm, with G = Z N , S = Z M and Q G replaced by the standard quantum Fourier transform Q N . The main register uses n = log 2 N and the auxiliary register uses m = log 2 M qubits. After the first two steps the quantum state of main and auxiliary register equals Suppose we were in possession of partial information about the function f , in form of a promise.
Promise 11. The function f can be written in the form f (2x(+1)) = 2f (x)(+1), for some other functioñ f : Z N/2 → Z M/2 . In particular, it maps even numbers to even numbers and odd to odd.
This example was first proposed in [18] and it is a special case of Promise 7; here, we go through the calculations of the optimized on-the-go erasure (Figure 6a) and provide an explicit simplification of the function oracle (Figure 6b). First of all, if f maps even to even numbers and odd to odd, the least significant qubits of main and auxiliary register in Eq. (21) are always fully entangled. By reordering them, we can write and apply the result from Lemma 1 to the Bell state |χ . Since the reduced main register is unaffected by this erasure, the algorithm still works to determine the period  (Figure 2) the group G has been replaced by K, henceforth, also the generalized quantum Fourier transforms QG had to be replaced by QK . The remaining steps are as in Figure 2; in the following enumeration they are grayed out, while steps 2a' -2c' are encapsulated byÕ f in the above circuit: r. In this case, the local unitary operations U G and U S with the purpose to compress the entanglement between main and auxiliary register into well defined qubits can be chosen to be trivial, U G = 1 G and U S = 1 S (Figure 6a). Ultimately, the reason for this was that (part of) the entanglement was between well-known qubits: the least significant ones. In general, however, this cannot be assumed to be the case.
How much worth is the partial information in Promise 11 in terms of computational complexity? An alternative usage of the partial information is to run the PFA not for the function f : Z N → Z M but ratherf : Z N/2 → Z M/2 withf (x) = f (2x)/2. This algorithm requires 2 fewer qubits to run. Operationally, we simply don't let the PFA act on the least significant qubits, we replace the quantum Fourier transform Q N by Q N/2 and we let the function oracle act on all but the least significant qubits (Figure 6b).

Discrete logarithm problems
Another special case of the HSP is the discrete logarithm problem, which has applications in classical publickey cryptography.
Problem 12 (Discrete logarithm problem). Given the cyclic group S = {1, γ, . . . , γ N −1 } of order N with generator γ and some element A ∈ S. The question is which a ∈ Z/N Z satisfies γ a = A.
This problem can be rephrased as a HSP (see [23] for a pedadogical derivation) by introducing the group G = Z/N Z × Z/N Z and a function The function f is a homomorphism of groups: Let (i, j), (k, ) ∈ G, then f ((i, j) + (k, )) = f (i + k, j + ) The discrete logarithm is now solved by finding the hidden subgroup H = (a, 1) ⊆ G. In this formulation, the on-the-go erasure protocol is again applicable, given that partial information in the form of Promise 7 is available. This could again be the case in form of an intermediate subgroup H ⊆ K ⊆ G.

IV. DISCUSSION
Summary. In the resource theory of thermodynamics we optimized the erasure costs of erasing auxiliary qubits in the algorithms solving the HSP. To achieve this, we applied the result from [18], which states that quantum side-information in the form of entanglement can be used as a resource to reduce the cost of erasing quantum systems. Lastly, we quantified the cost of using said sideinformation in terms of a trade-off: the side-information optimized on-the-go erasure On-the-go erasure for the period finding algorithm.
simplified function oracleÕ f Oracle simplification for the period finding algorithm.
FIG. 6. Two-qubit optimizations to the period finding algorithm. (6a) Here, the circuit describing the optimized on-the-go erasure of the least significant qubits in the period finding algorithm is shown. Given the promise that the function f maps even numbers to even numbers, the oracle O f fully entangles the least significant qubits of main and auxiliary register. The auxiliary qubit of this pair can then be erased at a negative average work cost of −kBT ln 2 if the erasure is performed at temperature T . (6b) At equivalent thermodynamic costs, the period finding algorithm can alternatively be simplified to an algorithm which uses 2 qubits fewer by keeping the least significant qubits of main and auxiliary register constantly in the ground state |0 .
could be used to reduce the algorithm width, at equal thermodynamic costs. Applicability. Our work has treated three possibilities to erase auxiliary qubits in a quantum algorithm. When considering our proposal for an optimized on-the-go erasure of qubits for application, the following costs have to be weighted against each other. On-the-go erasure versus: 1. Straightforward erasure: Given the architecture of the quantum computer, does the work cost reduction by 2 k B T ln 2 outweigh the gate costs of the local unitaries U G and U S ? 2. Bennett's uncomputing: What restrictions does the quantum computer put on the algorithm's width and what is the gate cost of implementing many parallel copies of the original circuit together with a quantum version of the classical postprocessing and a reversible majority vote compared to the gain of (m − 2 )k B T ln 2? Considering current gate costs or even fundamental limits [10], this method is unlikely to yield a thermodynamic advantage in any practical scenario.
The toy example for the PFA demonstrates that there are cases where the local transformations U G and U S are trivial, hence they do not add any complexity to the algorithm, giving the optimized on-the-go erasure a strict advantage over approaches (1) and (2). Last but not least, the optimized on-the-go erasure has to be compared to another option:

Oracle simplification: Is O f given with open circuit access or is it given as a black box?
If the oracle is available with open circuit access, the simplification comes with a decrease of complexity, making the algorithm use 2 fewer qubits. For a black box, the complexity is roughly the same, with the difference coming from the quantum Fourier transform which has to be performed on fewer qubits. At the level of thermodynamic costs, both options are equivalent. Complexity implications. Depending on the type of partial information available to perform the on-the-go erasure, the complexity of the transformations U G and U S can range from being exponential in the input size to being almost trivial. The reason for this is that Theorem 8 (together with Promise 7) is an existence result and the complexity of the transformations depends on the particular choice of computational basis representation of the states |g and |s , for g ∈ G and s ∈ S respectively. In the scenario, where we have access to side information in the form of an intermediate subgroup K, such that H ⊆ K ⊆ G, there is no a priori reason for the Hilbert space H K = span C {|k : k ∈ K} to be represented by a subregister of qubits of the main register H G . In the most general case, a unitary transformation is required to permute basis elements and ensure H K is encoded on a subset of qubits of the main register. This transformation (which in matrix form only has 0 and 1 elements) has exponential gate complexity O(n2 n ) [31], in the general case. This is not to say that the transformations U G and U S cannot be implemented efficiently. There are cases where the computational basis representation for G already ensures that H K is implemented on a subset of the main register's qubits. In these cases, U G only has to permute qubits and has thus gate complexity bounded by O(log |K|). For example, in the PFA, this is the case for all subgroups of G = Z/N Z generated by powers of 2 (Section III C 1), and for the discrete logarithm for all subgroups of G = Z/N Z × Z/N Z of the form 2 k ×Z/N Z (Section III C 2). For the transformation U S to satisfy similar complexity bounds, the target space's computational representation has to be decomposed analogously to the main register; this is discussed in more detail in Appendix B 4.
Outsourcing thermodynamic processes to an information battery. In this presentation, all qubit erasure processes take place in the computational registers. It is possible to outsource this thermodynamic task to an external battery register [7,[32][33][34]. The battery consists of fueled qubits in state |0 and depleted qubits in the fully mixed state 1/2; the erasure of depleted qubits takes place there at temperature T , with an average work cost of k B T ln 2 per qubit. The idea is that when we identify pure or fully mixed qubits that need erasure, we exchange them with those in the battery. That way, all thermodynamic processes that require interaction with an environment are take place in the battery, protecting the main registers from dissipation. The price of using a battery is the need for additional SWAP gates between the computational registers and the battery, which depending on the hardware architecture may be costly. Since the information battery does not have an effect on the number of qubits that need to be erased in a quantum computation, further discussion is outsourced to Appendix A 2.
Relation to algorithmic cooling. Algorithmic cooling is the process of producing cold (that is, approximately pure) qubits [35]. There are approaches that extract entropy from a target system by coupling it to thermal baths in an approach called heat-bath algorithmic cooling [36][37][38]. Our optimizations in erasure distinguish themselves from algorithmic cooling in that they are not primarily about the production of pure qubits but rather about reducing the thermodynamic cost of said erasure using entanglement as a further resource. When out-sourcing the erasure process into an external information battery (see Appendix A 2), one could apply algorithmic cooling there to produce pure battery states.
Single-shot and finite-size effects. In our analysis, we have simplified the work cost of erasing a single qubit. Using the von Neumann entropy to quantify the work cost of erasure is an approximation valid in the asymptotic i.i.d. limit; for any finite number of rounds smooth entropies are more precise measures of work and heat in erasure [18,39]. If the rest Hamiltonian of the qubits is not fully degenerate, one needs to employ single-shot versions of the free energy [40]; if we want to account for finite-size effects (either on the environment, on thermalizing operations, or energy gaps allowed in intermediate stages of erasure), further corrections are necessary to find the exact work cost as a random variable [41,42]. All these corrections can be applied on top of our results: as mentioned in the introduction, our focus is minimizing the number of qubits that need to be erased through interaction with a thermal environment. The exact cost of that erasure can then be computed in the appropriate regime using some of the corrections above; which ones are relevant depends on the hardware architecture. Similarly, the hardware will determine the actual thermodynamic cost of individual unitary gates, which affects the calculation of whether is better to perform erasure on-the-go or to simplify the circuit.
Open questions. A natural follow-up project is to study on-the-go erasure for arbitrary quantum algorithms. Within this setting, one could attempt to generalize the no-go result and the trade-off found in this paper for the HSP algorithm. In that general setting it would also be interesting to explore automatization of the search for optimized erasure (or algorithm simplification) points, for example using entanglement detection [43,44], without affecting the state of the main register. This first appendix is dedicated to providing an explicit protocol for erasing a fully mixed qubit at the Landauer limit (Appendix A 1) and to review the basics of an information battery in quantum computing (Appendix A 2).

Explicit thermodynamic protocol for erasure of a fully mixed qubit
Erasing a fully mixed qubit, that it mapping 1/2 → |0 0| comes with diverging resource costs by the third law of thermodynamics [45][46][47] which has been established in quantum thermodynamics as well, with diverging resource costs being time, energy or control complexity [16,48,49]. Here we showcase a protocol [20] which asymptotically implements the erasure of a qubit. The setup for the erasure consists of three quantum systems.
Qubit. The system of the qubit is described by the two dimensional Hilbert space H S = C 2 with basis {|0 S , |1 S }. Furthermore, it is assumed that the energy levels of this system are degenerate, this is achieved with the Hamiltonian H S = 0.
Work storage. In an idealized scenario the work storage consists of an infinite number of evenly spaced, nondegenerate energy eigenstates The Hamiltonian in given by H W = k∈Z k∆|E k E k | with ∆ the energy spacing between two neighbouring levels |E k , |E k+1 . An experimental realization will only be able approximate this system with energy levels bounded from below. Because the explicit implementation of the qubit erasure is not relevant for the remaining treatment of the online erasure, we will not investigate this any further.
Heat bath. The heat bath is an ensemble of N qubits thermalized at a temperature β = 1/k B T where each qubit has a different energy spacing. The Hilbert space is H bath = (C 2 ) ⊗N , with basis {|0 , |1 } for the -th factor. The Hamiltonian governing the dynamics of the system is The energy ∆ is the same as for the work storage. Requiring that the qubits of the heat bath are at a temperature β gives the thermal state of H bath to be Erasure of a qubit. In a first example we consider the erasure of one fully mixed qubit ρ = 1/2 in H S . For a heat bath consisting of N qubits an erasure is performed in N steps. In step (1 ≤ ≤ N ) the qubit from H S is swapped with the -th qubit from the heat bath H bath and simultaneously the energy level of the work storage is lowered by steps to preserve energy. The unitary operation implementing this step is and it commutes with the Hamiltonian of the joint system of the work storage, qubit and heat bath. The energy level diagram in Figure 7 (adapted from [20]) visualizes this unitary operation for = 3. The erasure process is the composition U = U (N ) · · · U (2) U (1) . With the initial state the erasure U acts on ρ i such that after the erasure we are left with the reduced state of the S register For a large number N of heat bath qubits, this process corresponds to an erasure of the qubit in system S: The fully mixed state 1 S /2 is mapped to |0 0| S asymptotically as N → ∞. In this process, the work storage performs a work of W = k B T log 2 for the erasure which is dissipated as heat into the bath. In the more general case, where the system qubit is not necessarily a fully mixed state but rather ρ S = (1 − p)|0 0| S + p|1 1| S , the erasure unitary U can be truncated which leads to a lower erasure cost of W (S) = H(S)k B T log 2 with H(S) the von Neumann entropy of the system S. This is an explicit realization of the result from [18] for a single qubit. For many qubits this process can be performed on each qubit individually.

Using an information battery inside the quantum computer
Qubit battery registers. For the purpose of this work it suffices to consider two types of battery registers: One , containing only fully mixed qubits which are completely passive [50,51], that is, there exists no unitary operation extracting energy from such a state. The second register H . (A4) Using a thermal reservoir at temperature T it is at best possible to extract k B T ln 2 work from a fueled qubit. In our modifications to the HSP algorithm, instead of partially erasing Bell pairs in the computational registers, we swap them with a fully mixed and a pure qubit from the battery which amounts to a gain of one pure (fueled) qubit in the information battery.
Extraction by Swapping. If the entanglement between the main register G and the auxiliary register S is given by fully entangled qubits, for example in the Bell state |χ = (|00 + |11 )/ √ 2, then this state can be replaced by fully mixed qubit for G and a pure qubit for S via a swapping operation. On the reduced H G ⊗ H S register, this is equivalent as a partial erasure of the qubit from S, while preserving G. General Bell pair extraction. In general one is not lucky enough for the entanglement between main and auxiliary register to be given in the form of well defined Bell pairs. Since local unitary transformations of G and S respectively preserve the conditional entropy between these two registers, the entanglement can be spread across many qubits. For the class of algorithms solving the HSP, we have shown that there always exist local unitaries U G and U S such that the entanglement between the registers can be compressed into Bellpairs (see Appendix B 3). Instead of using these unitaries to prepare the registers for being erased as in Figure 4, they can be used to swap the entangled states with states from the battery. In Figure 8, the situation is presented for a single Bell pair swap. It generalizes to many Bell pairs without any complications.
Problem 2 (Hidden Subgroup Problem [22]). Let G be a finite group, S some finite set and f : G → S a function. Given the existence of a subgroup H ⊆ G such that for all g, g ∈ G f (g) = f (g ) ⇐⇒ gH = g H, the goal is to determine H.
From now on, G shall be an Abelian group. The HSP for general non-Abelian groups does not yet have an efficient quantum algorithm [23]. We diverge from the notation in Eq. (10) and denote by g + h ∈ G the element in G obtained by the additive group operation on g and h in G. The unit element is 0. In particular, the cosets with respect to some subgroup H ⊆ G are from now on denoted byḡ = g + H ∈ G/H. We present important definitions and results from group theory [52] and representation theory [53] which will be used in the following discussion (formulation and notation of the results from [52,53] has been adapted to the specific setting of the HSP at hand).
Theorem 13 (Classification of finite Abelian groups [52]). For any finite Abelian group G there exist positive integers a 1 , . . . , a m ∈ N such that and a 1 |a 2 | · · · |a m , where a i for all 1 ≤ i ≤ m and m are uniquely determined.

Proposition 14 ([53]
). For each 0 ≤ k ≤ a − 1, the function χ k : Z/aZ → S 1 declared by χ k ( ) = ω k a = e 2πik /a is a character of the irreducible representation of the cyclic group Z/aZ. In fact, these are all characters of irreducible representations of Z/aZ.

Proposition 15 ([53]
). The characters of G ∼ = Z/a 1 Z × · · · × Z/a m Z (c.f. Theorem 13) are given by where the factors are as in Proposition 14 and elements g, h ∈ G are understood as in the decomposition of G into cyclic factors. The irreducible representation ρ g having χ g as character is given by the product ρ g = ρ g1 · · · ρ gm with the factors as in Eq. (B2).
Remark. In this special Abelian case the following basic properties are satisfied by the character χ g as introduced in Proposition 15: For any g, h ∈ G : χ g (h) = χ h (g) and if we take another g ∈ G the characters act as group homomorphisms χ h (g +g ) = χ h (g)χ h (g ). However, this is only true for characters of one-dimensional irreducible representations and does not hold in general.
Theorem 16 (First orthogonality relation of characters (abelian version) [53]). Let g, g ∈ G be elements of the Abelian Group G. In the space of functions G → C the two characters χ g , χ g of irreducible representations of G are orthonormal The chararacters are defined as in Proposition 15.
A reformulation of the quantum Fourier transform for states representing elements of G can be given in terms of characters. Consider again the cyclic group Z/aZ, pick some k ∈ Z/aZ and define Indeed this is the quantum Fourier transform, where we used the abbreviation ω a = e 2πi/a for the a-th root of unity. A generalization is given by Definition 17 (Quantum Fourier transform of a group register [23]). Let G be a finite Abelian group with decomposition as in Theorem 13. For any g ∈ G the character state |χ g is declared by the functions χ g , g ∈ G as in Proposition 15.
In the algorithm solving the HSP from Figure 2, states will be transformed according to the rule in Eq. (B6) and certain summands will cancel out according to Theorem 16. The following subgroup will be of particular interest to us Elements of H ⊥ define functions, their characters, which allow us for probing the subgroup H.
Step 1. Denote by ρ the density matrix of the joint G and S register after iteration step in the HSP algorithm, in particular ρ 0 = |0 0| ⊗ |e e|. The first steps of the algorithm where we used χ 0 (g) = 1 for all g ∈ G.
Step 2. Oracle Operation: In the Eq. (B12) we used that the choice of representative of g inḡ = g + H ∈ G/H doesn't affect the inner summation and that f (g) = f (g ) if and only ifḡ =ḡ . At this stage the S register is traced out Step 3 -4. The remaining two steps are performed on the reduced G register. The states will be denoted by a dash ρ .
The measurement result of the G register gives an elementg ∈ H ⊥ . This element defines the function χg : G → S 1 whose restriction to H is the unit function. For all h ∈ H, χg(h) = 1. Multiple iterations of the HSP algorithm give a set of such functions {χg}g constraining H ⊆ G and thus solving the problem.
where ρ GS is the state of the computational registers after the oracle operation O f .
Proof. The measure we use to quantify the degree of entanglement between the G and S register is the conditional von Neumann entropy where H is the standard von Neumann entropy [28][29][30].
The number of Bell pairs formed by qubits from G and S will be upper bounded by −H(S|G) as a single bell pair contributes a negative conditional entropy of −1. The joint state of H G ⊗ H S is ρ GS and ρ G = Tr S ρ GS is the reduced state of the G register. Observe the following two facts: Firstly, up until the erasure of the S register which takes place after step 2 in the HSP algorithm, the joint state ρ GS is a pure state. Secondly, we have ρ GS = ρ G ⊗ρ S before step 2 where the function oracle is applied, where ρ G and ρ S = |0 0| are pure states. Assuming that the function oracle O f is a black box, the only stage of the HSP algorithm where H(S|G) is non-zero is after O f but before the H S is traced out. The corresponding state from Eq. (B12) is whose conditional entropy H(S|G) is given by with ρ G = ρ 2 the reduced G register state, c.f. Eq. (B13). The entropy measure is invariant under unitary transformations of ρ G . By reordering the computational basis of the G register we can split The factor ρ H is a pure state of H H , the right one ρ G/H is a mixed state in H G/H , already represented in it's diagonal basis with eigenvalues |H|/|G|. The entropy of which gives the entropy of the S register conditioned on G Up to the negative sign this is equal the maximal number k := log 2 (|G|/|H|) of Bell pairs formed by qubits from the G and S register which can possibly be extracted from the HSP algorithm.
The relevant states in the ancillary register H S are of the form |f (g) for g ∈ G. In fact, by the very defining assumption for the HSP in Eq. (10), it suffices to restrict to representatives g of cosetsḡ ∈ G/H. Generally, the ancillary register H S may have more qubits than are actually needed to represent imf ⊆ S. This overhead of qubits can be factored out by reordering the computational basis of H S such that |f (g) → |0 ⊗ |f (g) ∈ H (1) S ⊗ H B2 . This operation is unitary and can be chosen such that for all representatives g, the states |f (g) ∈ H B2 and |g ∈ H G/H have the same computational representation. This transformation will be denoted by U S .
Theorem 5 (No-go for saturating the bound). Any onthe-go erasure protocol applying local unitaries U G and U S to factorize the maximum amount max of Bell pairs from Theorem 3 can be used to solve the HSP.
In the second step the register H and G is highlighted for the states |h H and |h G . That is because for H G , we have access to an encoding g ∈ G → |g ∈ H G while for H H we do not. That is also the reason why one has to use the inverse operation U † G to obtain H. As with the functions χg from the standard algorithm solving the HSP in Appendix B 2, this procedure can be used to determine a small number of elements h ∈ H which then generate the whole subgroup H.
In fact one can even go further: Finding an on-the-go erasure procedure in the setting of Theorem 5 is more difficult than solving the HSP, for that it also requires the transformation U S . For partial information erasure procedures we give a quantitative description of how much information is required to compress the entanglement for an on-the-go erasure.
Definition 18 (General local transformations of G and S). Define local transformations which factor quantum states encoding elements in g ∈ G and s ∈ S according to U S |s = |s (1) , s (2) . (B33) Similarly to the notation introduced in Eq. (B33), let us write for some state |f (g) ∈ H S , U S |f (g) = |f (1) (g), f (2) (g) . Using this notation we can formulate two general conditions on transformations U G and U S : Theorem 19 (General characterization of partial erasure transformations). If and only if the transformations U G and U S satisfy the two requirements 1. For all g,g ∈ G : f (g) = f (g) → g (2) =g (2) , 2. The function f (1) (g) only depends on g (1) and f (2) (g) = g (2) in the binary computational representation in H G , they can factor out the entanglement in the form of Bell pairs after O f in the HSP algorithm, where = dim H Proof. We obtain conditions on transformations U G and U S which allow bringing the joint state of the G and S register into the form We allow that U G may only factor part of the register H G/H , say k ≤ |G|/|H| qubits. The transformations need not necessarily respect the group structure of G, hence, we refrain from using an intermediate subgroup K as in the main part of the paper but rather work with the factorization from Definition 18. Starting with the state ρ 2 after step 2 of the standard HSP algorithm (see Figure 2 and Appendix B 2), we find a new state ρ 2a The necessary and sufficient condition for the transformations U G and U S factoring Bell pairs is the equality ♥ above.
as this decomposition satisfies all two assumptions from Theorem 19. Steps 1 -2. These steps are the same as for the standard HSP algorithm. The resulting state is (B42) Steps 2a -2c.
Applying the operation U G ⊗ U S to the state ρ 2 gives (see calculation in proof of Theorem 19) (1) ,g (1) |g (1) g (1) | ⊗ |f (1) (g (1) ) f (1) (g (1) )| The Bell pairs |χ are formed between qubits from H Conversely the erasure E is done with quantum side information according to Theorem 1. The average work cost of erasure at temperature T is given by which amounts to a total average work cost of erasure The erasure leaves the reduced state of the G register invariant. After uncomputing U G , we get as in Eq. (B13) from the standard HSP algorithm.
Steps 3 -6. Based on the last observation, these steps go through as for the standard case.

Gate complexity of UG and US
The transformations U G and U S from Definition 18 which are used in Theorem 19 are permutations of the basis states |g and |s for g ∈ G and s ∈ S. These permutations ensure that after the application of the function oracle O f , the entanglement is compressed into a well-defined subregister of the main and auxiliary register.
In general, a permutation unitary on the computational basis states of n qubits requires O(n2 n ) CNOT gates [31] and is therefore not efficiently implementable. Nevertheless, depending on the type of partial information available, the complexity of the transformations U G and U S can be drastically reduced (see for example the PFA, Section III C 1). To this end, let us work in the special setting where the partial information is available in the form of an intermediate subgroup K, such that H ⊆ K ⊆ G (as in Promise 7). There, the transformations U G and U S act on a state |g, f (g) as follows, where k g ∈ K and [k] ∈ G/K are a decomposition of g ∈ G into an element in K and the quotient group G/K. Consider the special case where K is already implemented on a subset of qubits of the main registerthat is H K = span C {|k : k ∈ K} is the Hilbert space generated by some but not necessarily all qubits that span H G . Here the transformation U G is only a composition of qubit swaps which can be implemented efficiently with a complexity O(log |K|). For the target space an analogous rule holds. If the map f : G → S implemented on the level of the function oracle O f respects the qubit decomposition of H G into H K ⊗ H G/K , that is, these subregisters are mapped to subregisters of the auxiliary space H S , then also U S has complexity O(log |K|). One particular case where this happens is the toy example for the PFA shown in Section III C 1.
H G/K . Then, Steps 3 -4. Making use of the notation introduced in Eq. (C2) where g(k, t) ∈ G is the unique element s.t. U G |g(k, t) = |k, t the next states can be written down, For the last equality we used property 2 imposed in Theorem 19. At this stage we see thatÕ f acts trivially on the registers H S , saving 2 erasures like the online erasure protocols do.
Steps 5 -6: Recovering the hidden subgroup. It remains to be shown that the modified algorithm can still be used to determine the hidden subgroup H. Let ρ 4 = Tr H G/K ⊗H S ρ 4 (C5) be the reduced state of ρ 4 where all registers but H K have been traced out. Observing H ⊆ K, the quantum Fourier transform Q K acts on ρ 4 as follows where H K ⊥ = {g ∈ K : ∀h ∈ H : χ g (h) = 1} is the analogue of H ⊥ from the standard HSP algorithm with the difference that G has been replaced by K. This calculation demonstrates that the modified algorithm still recovers the hidden subgroup H.