Reassessing the computational advantage of quantum-controlled ordering of gates

Research on indefinite causal structures is a rapidly evolving field that has a potential not only to make a radical revision of the classical understanding of space-time but also to achieve enhanced functionalities of quantum information processing. For example, it is known that indefinite causal structures provide exponential advantage in communication complexity when compared to causal protocols. In quantum computation, such structures can decide whether two unitary gates commute or anticommute with a single call to each gate, which is impossible with conventional (causal) quantum algorithms. A generalization of this effect to $n$ unitary gates, originally introduced in M. Ara\'ujo et al., Phys. Rev. Lett. 113, 250402 (2014) and often called Fourier promise problem (FPP), can be solved with the quantum-$n$-switch and a single call to each gate, while the best known causal algorithm so far calls $O(n^2)$ gates. In this work, we show that this advantage is smaller than expected. In fact, we present a causal algorithm that solves the only known specific FPP with $O(n \log(n))$ queries and a causal algorithm that solves every FPP with $O(n\sqrt{n})$ queries. Besides the interest in such algorithms on their own, our results limit the expected advantage of indefinite causal structures for these problems.


I. INTRODUCTION
One of the most fundamental concepts in science is that of causality: the idea that events occur in a fixed order. It is embedded in the very structure of computation in which operations are performed one after the other. In particular, a quantum circuit is built out of wires, representing the quantum states, and boxes, representing the gates acting on these states in fixed order. However, it was suggested that the interplay between general relativity and quantum theory might require superseding such a paradigm [1,2]. Within the last decade, quantum frameworks have been developed that enable the description of indefinite causal structures in which no well-defined global order of events exists [1, 3,4].
It was observed that the use of indefinite causal structures in information processing can solve certain tasks which cannot be completed by causally ordered quantum circuits [5] and exponentially reduce the communication cost in communication complexity problems [6]. Furthermore, they can boost the rate of communication through noisy channels [7][8][9][10][11], although causal circuits can achieve the same or even better noise reduction [12][13][14]. The computational complexity of indefinite causal structures has been studied [15,16] and their experimental accessibility was demonstrated in enhanced quantum photonics experiments [17][18][19][20][21][22].
The most simple example of indefinite causality is based on the quantum switch [3]. In the quantum switch, two * martin.renner@univie.ac.at gates act on a target system and the order in which the two gates are applied is controlled by a qubit: if the state of the control qubit is |0 , the gate U 0 is applied before U 1 whereas if the control qubit is in the state |1 , the order is reversed. With this quantum-controlled ordering of gates, one can solve certain tasks more efficiently than with any conventional (causal) quantum algorithm. Specifically, one can determine whether two unitary gates commute or anticommute with a single call to each gate, while with any causal quantum algorithm, at least one gate has to be called twice [5].
A generalization of the quantum switch to an arbitrary number of gates is the quantum-n-switch. Here, depending on the state of the control system, any permutation of the n gates can be applied on the target system. In order to study the computational power of this quantum-controlled ordering of gates, a promise problem was introduced in Ref. [23]. This task, which we will call Fourier promise problem (FPP) here, can be solved with the quantum-n-switch and a single call to each gate (n queries). At the same time, it was expected that solving the same task with a causal quantum algorithm requires O(n 2 ) queries. In a recent study, this idea is extended to other promise problems that are easier to realize experimentally [22].
In this work, we consider the solutions to the specific and general Fourier promise problems using both the quantum-n-switch and causal quantum algorithms. We find that the reduction in the query complexity using the quantum-n-switch is smaller than what was assumed so far. More precisely, we present a causal algorithm that solves the only known specific FPP with O(n log n) queries and further, a causal algorithm that solves every arXiv:2102.11293v1 [quant-ph] 22 Feb 2021 FPP with O(n √ n) queries. This reduces the expected advantage of indefinite causal structures in solving this computational task as compared to causal circuits.
The article is structured as follows: In Section II, we give an overview of the Fourier promise problem, the solution with the quantum-n-switch and the best causal algorithm that uses O(n 2 ) queries. In Section III, we derive the property that allows us to find more efficient causal algorithms and give a first example of such an algorithm in Section IV. The two main results of this article can be found thereafter. In Section V, we present a causal algorithm that solves a specific FPP with O(n log n) queries. In Section VI, we give a causal algorithm that solves every FPP with O(n √ n) queries. Depending on the state of the control system, the gates act on the target system in a different order. For the case of n = 3, each basis state of the six-dimensional control system realizes a different permutation of the gates. If the control system is initialized in a superposition, the n-switch can be used to solve Fourier promise problems. In this way, each unitary U i is called only once.
For example, in the case of two unitaries U 0 and U 1 , the two permutations U 1 U 0 and U 0 U 1 can be either labeled by Π 0 = U 1 U 0 and Π 1 = U 0 U 1 or the other way around (Π 1 = U 1 U 0 and Π 0 = U 0 U 1 ). While the promise for x = 0 is trivially satisfied, the promise for x = 1, namely Π 1 = ω 1·y · Π 0 , translates for both labelings into the fact that U 0 and U 1 either commute (y = 0) or anticommute (y = 1): (II. 2) The task is to find out which property is the correct one.
For n ≥ 3, there are different ways to label the permutations that lead in general to inequivalent tasks (examples are given in Subsection IV A). In this sense, Fourier promise problems form an entire class of problems and we use the term "specific Fourier promise problem" whenever we refer to a precise labeling of the permutations. To show that this class of problems is non-trivial, one has to prove that for every n there is at least one specific FPP for which there indeed exist unitaries that satisfy the promise. This is shown in Appendix A of the original work of M. Araújo et al. [23], where they construct for every n ≥ 2 and every y ∈ {0, 1, ..., n! − 1} a set of unitaries {U i } n−1 0 that satisfy the promise Π x = ω x·y · Π 0 for a given labeling of the permutations. 1 We want to point out that, for a given n, this is the only specific FPP for which the existence of these unitaries is explicitly shown (and hence the only task that is proven to be non-trivial). For this specific task, we will present in Section V a causal algorithm that is very efficient in the amount of called black-box unitaries (queries), but has the disadvantage that it cannot be adapted directly to (other possibly existing non-trivial) FPPs where the permutations are labeled differently. Note, however, that in Ref. [23] the distinction between specific FPPs is not made explicitly, since only algorithms are considered that can be adapted to every FPP (independently of the precise labeling of the permutations).

A. Solution with the quantum-n-switch
The quantum-n-switch (denoted as S n and called nswitch for short) is the quantum gate that applies, depending on the state of the control system |x , the permutation Π x on the target system |Ψ t : Moreover, since the n!-dimensional Fourier transform is frequently used in this article, we formally introduce it here. In symbols, With the use of the n-switch, one can solve every FPP, as described in Ref. [23]; the control system is initialized in the n!-dimensional state |0 c and the target system |Ψ t in an arbitrary d-dimensional state. The Fourier transform F n! transforms the control system into an equal superposition of all states x ∈ {0, 1, ..., n! − 1}: Afterwards, the n-switch applies, depending on the state |x of the control system, the permutation Π x on the target system |Ψ t (see Fig. 1 for an illustration of the map for the case of n = 3): (II.7) With the promise Π x = ω xẏ · Π 0 , this state can be rewritten into: (II. 8) In this way, the target system becomes independent of x and factorizes out in the state Π 0 |Ψ t . After applying the inverse Fourier transform on the control system, the desired value of y can be read out with a measurement of the control system in the computational basis: (II.9) Since the n-switch can apply every permutation of the unitaries with a single call to each gate, the total query complexity of this algorithm is n.

B. Solution with causal quantum algorithms
In this section, we give an overview of the best causal algorithms for FPPs that are known. All of them are based on the simulation of the n-switch and call O(n 2 ) black-box unitaries. A causal quantum algorithm simulates the action of the n-switch (denoted as S sim. n ) if it implements the transformation for every x ∈ {0, 1, ..., n! − 1}, arbitrary states |Ψ t and |a i as well as constants k i that do not depend on x. Every simulation of the n-switch can be used in combination with the algorithm in Fig. 2 to solve every FPP; analogously to the n-switch in the last subsection, the control system is prepared with a quantum Fourier transform in an equal superposition over all states x ∈ {0, 1, ..., n! − 1}. By linearity, an algorithm that simulates the n-switch implements the transformation: Again, the promise Π x = ω x·y · Π 0 is used to obtain the last equality. After applying the inverse Fourier transform to the control system, the solution y can be read out in the control system.
One algorithm that can implement the transformation S sim. n is given in Fig. 3. This one was originally introduced in Ref. [24] and is also presented in Appendix C of Ref. [23]. In each step i = 0, 1, ..., n − 1, the gate S swaps, controlled on |x , the target system |Ψ t with the auxiliary system |a σx(i) . After the gate U σx(i) acts on |Ψ t , another gate S swaps the two systems |Ψ t and |a σx(i) back. In this way, the permutation Π x = U σx(n−1) ...U σx(1) U σx(0) is applied to the target system. In this algorithm, each gate U i is used exactly n times, so the query complexity of this algorithm is n 2 . Furthermore, for every permutation Π x , each auxiliary system |a i is swapped back and forth with |Ψ t exactly once. Hence, independently of the state of the control system |x , the gate U i is applied once on |Ψ t and the remaining (n − 1) times on |a i . In this way, each FIG. 2: Solution of every FPP with the simulation of the quantum-n-switch: With a Fourier transform the control system is prepared in an equal superposition of all states x. The simulation of the n-switch S sim n applies, depending on the state of the control system |x , the permutation Π x on the target system. The solution y can be read out after applying the inverse Fourier transform to the control system.
auxiliary system |a i ends up in the state (U i ) n−1 |a i and the algorithm implements the transformation given in Eq. (II.10) for

FIG. 3:
A simulation of the n-switch with a causal algorithm: The permutation Π x is applied on |Ψ t by swapping the target system in each step i = 0, 1, ..., n − 1 with the auxiliary system |a σx(i) .
There are simulations of the quantum-n-switch with causal quantum circuits that are slightly more efficient. All of them call O(n 2 ) black-box gates in total. This is studied in Ref. [25] (and also in Ref. [23]).

A. Pairwise commutation relations
In this article, we show that there are algorithms that solve Fourier promise problems by calling significantly less queries than the simulation of the n-switch requires. The 2 Further details about the representation of the control system |x c and the implementation of the S-gate can be found in Appendix C of Ref. [23] and in Ref. [24]. main ingredient is a property of the unitaries that can be directly inferred from the promise.
Definition 1 (Pairwise commutation relations). A set of unitaries {U i } n−1 0 satisfy "pairwise commutation relations", if for every pair of unitaries U j and U k (j, k ∈ {0, 1, ..., n − 1}) there exist α jk ∈ C such that: (III.1) Proposition 1. Every set of unitaries {U i } n−1 0 that satisfy the promise of a Fourier promise problem, satisfy pairwise commutation relations. Furthermore, if for a specific FPP the labeling of the permutations is given, the pairwise commutation relations read: ..U 1 U 0 , we denote all unitaries without U j and U k in descending order). 3 Proof. For every pair of black-box unitaries U j and U k , we focus on the two permutations introduced in the statement above: Due to the promise Π x = ω x·y Π 0 , both permutations are equal to Π 0 up to the phases ω −x 1 jk ·y and ω −x 2 jk ·y , respectively: Using the above expression for the two permutations Π x 1 jk and Π x 2 jk , we obtain 3 Note that with some caution, one could turn this statement into an equivalence: the condition of pairwise commutation relations is not only a necessary condition for the promise to hold. One can also check by direct calculation that whenever all permutation relations are pairwise (∀j = k ∃ x jk ∈ Z s.t. U j U k = ω x jk ·y · U k U j ), all permutations of the n unitaries are related by a phase (Πx = ω x·y Π 0 ). In order for the promise to hold, one has to choose the pairwise phases x jk such that every x ∈ {0, 1, ..., n! − 1} appears exactly once. Nevertheless, it is enough for our purpose that the promise induces pairwise commutation relations.
Multiplying from the left step by step with the inverses of U n−1 , U n−2 , ..., U 1 and U 0 (for U j and U k this is left out), this expression is equivalent to: jk )·y ∈ C, we conclude that every set of unitaries that satisfies the promise also satisfies pairwise commutation relations. All causal algorithms that we present in this article are of the form given in Fig. 4. The target systems |Ψ j and auxiliary systems |a i are initialized in an arbitrary ddimensional state. The important part of the algorithm is the one that realizes the transformation T FPP n :

B. Structure of the new algorithms
(III.9) On each target system |Ψ j , some of the n black-box unitaries from the set are applied, such that each of them ends up in a state f x j (U 0 , ..., U n−1 ) |Ψ j . The unitaries contained within f x j (U 0 , ..., U n−1 ) are the same for every x, but the order in which these unitaries are applied on the target system will explicitly depend on x. Since all commutation relations are pairwise, one can always rewrite this expression into f 0 j (U 0 , ..., U n−1 ) |Ψ j and a phase is obtained whenever two unitaries are commuted (with e iφj (x) we denote the product of these phases): (III. 10) In this way, the final state of each target system becomes independent of x and if the algorithm is designed carefully, all these phases multiply together to ω x·y .
Whenever we find an implementation of this transformation T FPP n for every x ∈ {0, 1, ..., n! − 1} we can solve the corresponding FPP. The control system is initialized in an equal superposition of all x and due to linearity, the transformation T FPP n realizes: At the end, the solution y can be read out after applying the inverse Fourier transform to the control system (see Fig. 4).
Intuitively speaking, the pairwise commutation relations allow us to simulate different parts of the total phase ω x·y on different target systems. Note that the best causal algorithms so far that are based on the simulation of the n-switch (Subsection II B) can be seen as a special case of this new method with only one target system (m = 1) and f x 1 (U 0 , ..., U n−1 ) = Π x . Hence, this new procedure is more general and usually more efficient than simulating every permutation on its own. We want to point out that these ideas can in principle be applied to every set of unitaries satisfying pairwise commutation relations. In this sense, this method may have some applications beyond Fourier promise problems.

IV. MORE EFFICIENT SOLUTIONS FOR FPPS WITH THREE UNITARIES
In this section, we show in a first example how pairwise commutation relations are useful to find more efficient causal algorithms. More precisely, we present an algorithm that solves every FPP for n = 3 with six queries, while the best causal algorithm that was known so far used seven queries. While this difference might not seem significant at first, we will show in the next sections that similar ideas can be used for a significant reduction in the number of used black-box gates in the asymptotic limit.
A. Two possible ways to label the permutations Before we present the algorithms, we give two specific examples of labelings for n = 3. We denote with If it is promised that the three unitaries satisfy these conditions, it is straightforward to read off the pairwise commutation relations: from the second line For the last pair U 0 and U 2 , we have to put in some more work: Due to Proposition 1, we can compare the fifth (U 1 U 0 U 2 ) and the third line (U 1 U 2 U 0 ) to obtain: There exist other FPPs that correspond to other labelings. Another example is the following: Here, the pairwise commutation relations can be read off as follows: On the other hand, knowing all pairwise phases uniquely determines the labeling (up to the freedom of choosing Π 0 ).
Note that not every labeling is meaningful. Some of them lead to trivial statements: This is a contradiction whenever y = 0 (note that ω = e 2πi n! = 1 for n ≥ 2). More precisely, only for y = 0, there exist unitaries U 0 , U 1 and U 2 that satisfy this promise, and the task becomes trivial (since one can conclude directly that the solution must be y = 0). By counting, we found that there are 24 different possible labelings of the six permutations that lead to non-trivial solutions for n = 3 if we restrict ourselves to

B. Standard causal algorithm with seven queries
The best causal algorithms known so far that solve these problems are based on the simulation of the 3-switch and call seven black-box unitaries. One possible algorithm that can achieve this is given in Fig. 5. The gates R denote rewirings of the target and auxiliary systems (a combination of controlled swaps). Depending on the state of the control system |x ijk , they interchange the wires in a way that the gates act on the systems according to Table I. All underlined gates U i act on |Ψ t and simulate the permutation U i U j U k , while the remaining (unused) gates U 0 and U 1 act on the corresponding auxiliary systems |a 0 and |a 1 , respectively.

3
) with the smallest possible number of used black-box gates.
In total, the combined control and target system simulate the action of the 3-switch, since every permutation of the three unitaries can be applied on the target system, while the two auxiliary systems always end up in the same state As explained in Subsection II B, this solves every FPP as the 3-switch itself. It is essential for an algorithm that simulates the nswitch that every permutation of the unitaries can be applied on the target system by rewiring the systems in some way. Here, for n = 3, every permutation has to appear as a substring in U 1 U 0 U 1 U 2 U 1 U 0 U 1 . The minimal length of a string of elements U 0 , U 1 , ..., U n−1 such that all possible permutations of the n elements are contained in the string as a substring is a well-studied problem in combinatorics: it is known that the minimal number of elements in such a string is of the order of O(n 2 ) [26]. For 3 ≤ n ≤ 7, the shortest string containing all permutations as a substring has length n 2 − 2n + 4 [27]. For higher n, more efficient constructions are known [28,29]. For this reason, no string of length smaller than seven can contain all the six permutations of three unitaries.

C. More efficient causal algorithm with six queries
Here, we will show that we can solve every FPP for n = 3 with only six queries using the algorithm given in Fig. 6. As before, the gates R are rewirings of the target and auxiliary systems. They interchange the wires in a way that the gates act on the systems according to Tab. II. All underlined gates U i act on |Ψ 1 , all overlined gates U i on |Ψ 2 and the remaining gate U 1 acts on the auxiliary system |a 1 .
that solves every FPP for three unitaries.  Fig. 6. Using pairwise commutation relations, one can rewrite the second line (see main text).
The crucial difference here is the permutation U 2 U 0 U 1 . It is not possible that the first target system |Ψ 1 ends up in the state U 2 U 0 U 1 |Ψ 1 directly, since the permutation U 1 U 0 U 2 is not contained as a substring of U 0 U 1 U 2 U 1 U 0 U 1 (remember that the order is reversed since U 1 has to act first, then U 0 and U 2 ). In this way, this algorithm is not able to simulate the (complete) 3-switch but nevertheless, it is able to solve FPPs. Since every set of unitaries that satisfies the promise satisfies pairwise commutation relations, we can use U 1 U 0 = α 10 U 0 U 1 to rewrite the second line of Table II into: Due to this, the algorithm implements a transformation that is very similar to the one that is implemented by the algorithm in the last subsection (Eq. (IV.20)). For every x ijk the permutation U i U j U k is applied on the first target system, while the second target system and the auxiliary system always end up in the state U 1 U 0 |Ψ 2 ⊗ U 1 |a 1 : In this way, the combined control and first target system simulate the action of the 3-switch for unitaries satisfying pairwise commutation relations. Therefore, this algorithm solves every FPP for n = 3 in the same way as the algorithm in the last subsection. 4 In Section VI, we use similar ideas and present an algorithm that simulates the action of the n-switch for unitaries that satisfy pairwise commutation relations (and is therefore able to solve every FPP) with O(n √ n) queries.

V. CAUSAL ALGORITHMS WITH QUERY COMPLEXITY O(n log n)
In this section, we present an algorithm that solves the FPP with the labeling used in Appendix A of Ref. [23] with O(n log n) queries. First, we recall how the permutations are labeled and derive the pairwise commutation relations for this labeling.
The identity permutation is defined as: The labeling of all other permutations Π x is based on the factorial number system; every x is represented with n−1 integers (a n−1 , ..., a 1 ) where a k ∈ {0, 1, ..., k}: 5 Starting with the identity permutation Π 0 = U n−1 ...U 1 U 0 , we obtain the permutation Π x by shifting first U 1 a 1 ∈ {0, 1} steps to the right, then U 2 a 2 ∈ {0, 1, 2} steps to the right and so on. The labeling for n = 3 is given as the first example in Subsection IV A (Eq. (IV.1)-(IV.6)). We call this labeling of the permutations the "factoradic" labeling.
Due to Proposition 1, we can read off the commutation relations for every pair of unitaries U j and U k (w.l.o.g. we assume here j < k). The two permutations we have to focus on are: To construct the first permutation Π x 1 jk from the identity permutation Π 0 , the unitary U j is first shifted j steps to the right. In a second step, U k is shifted k steps to the right, while all other unitaries are not shifted. Hence, the label of Π x 1 jk is x 1 jk = k · k! + j · j!. To obtain the permutation Π x 2 jk from the identity permutation, the unitary U j is first shifted j steps to the right and afterwards, U k is shifted (k − 1) steps to the right. The remaining unitaries are not shifted. Therefore, the label of the second permutation is x 2 jk = (k − 1) · k! + j · j!. If we combine this with Eq. (III.3), we obtain: Hence, whenever a unitary U k of the set for which the promise holds is commuted with a unitary of a smaller 5 This is Eq. (A4) in Ref. [23].
index, the result remains unchanged up to a phase ω k!·y . 6 To introduce the idea, we give the algorithm for n = 4 in the next subsection and generalize the procedure thereafter. Note that the query complexity of this example is actually worse than with the conventional method; it requires 18 queries, while the most efficient simulation of the quantum-4-switch calls twelve black-box unitaries. 7 Nonetheless, it is an instructive example whose generalization results in a significant reduction of the query complexity. As a further remark, we want to point out that we use the notation of controlled unknown unitaries merely for convenience. Note, however, that controlling unknown unitaries is impossible within the standard quantum circuit model [30] but can be realized in the interferometric type of setups [31][32][33]. At the end of this section (Subsection V B 3), we show that it is possible to rewrite the algorithm in a form that does not control unknown unitaries.
Hence, |x = 16 c is represented by |c 16 3,1 = 1 ⊗|c 16 3,2 = 0 ⊗ |c 16 2,1 = 1 ⊗ |c 16 2,2 = 1 ⊗ |c 16 1,1 = 0 ⊗ |c 16 1,2 = 0 . It is simple to check that indeed every x ∈ {0, 1, ..., 4! − 1} can be represented in this way. Note that this representation is not unique and most numbers can be decomposed in more than one way. For our purpose, it is enough to choose one representation for every x ∈ {0, 1, ..., 4!−1}. In accordance with the methods introduced in Subsection III B, we will show that the algorithm in Fig. 7 can implement the transformation T FPP 4 for this FPP. To see this, we look first at the target system |Ψ 2,1 . The gates U 3 , U 1 and U 0 act on this system but the order in which they are applied depends on the control qubits |c x 3,1 and |c x 1,1 . If both control qubits are in the state |0 , this target system ends up in the state U 3 U 1 U 0 |Ψ 2,1 . On the other hand, if one or both of the two control qubits are in the state |1 , the order of these gates is different. Nevertheless, due to the pairwise commutation relations given in Eq. (V.5), we can always rewrite the final state of the system |Ψ 2,1 into U 3 U 1 U 0 |Ψ 2,1 . By doing so, a phase is obtained whenever two unitaries are commuted (see Table III). controls whether the gate U 3 is applied before the two gates U 0 and U 1 , or after these two gates. Whenever |c x 3,1 = |1 , U 3 needs to be commuted with two unitaries of a smaller index (independent of the order of these two gates). In this way, a factor of ω 3!·y is picked up twice, which multiplies together to ω 12·y . Independently of this, |c x 1,1 controls the order of U 1 and U 0 . If |c x 1,1 = |1 , U 1 needs to be commuted with U 0 and a relative phase of ω 1!·y is picked up.
Since every x ∈ {0, 1, ..., 4! − 1} can be represented as in Eq. (V.7), the algorithm in Fig. 7 applies the following transformation for every such x: Note, that the final state of the target system is independent of the control system |x . Hence, we can use our algorithm and the procedure introduced in Subsection III B to solve this specific FPP. More precisely, applying a Fourier transform, the control system is initialized in an equal superposition of all |x . After applying the algorithm, a measurement of the control system in the Fourier basis will yield the desired value of y.
As mentioned before, the query complexity of this example is actually worse than with the simulation of the quantum-4-switch. Nevertheless, for larger n, the method that we introduce here solves this specific FPP with only O(n log n) queries. The reason for this scaling advantage comes from the fact that every number x ∈ {0, 1, ..., n! − 1} can be represented with O(n log n) bits c x k,i (remember that n! ≤ 2 n log 2 n ). In the algorithms presented here, every such bit corresponds to a control qubit and only two queries couple to every control qubit.
As a remark, note that the target system |Ψ 4,4 is tech-nically redundant. Moreover, the target system |Ψ 4,1 and the corresponding control qubit |c x 1,2 are also not needed since the five bits c x 3,1 , c x 3,2 , c x 2,1 , c x 2,2 and c x 1,1 are sufficient to represent every number x ∈ {0, 1, ..., 4! − 1} in the form given in Eq. (V.7). In this specific example for n = 4, we kept them in for completeness. It turns out that for every n, there are some target systems that can be left out. In order to keep the notation as simple as possible, and since this does not affect the overall scaling, we refrain from doing so.
a) C k and V k (C k andṼ k ) are used as a shorthand notation and are defined in Fig. 8b (Fig. 8c).

B. The algorithm for every n
In this section, we show how the idea of the above example can be generalized to solve this specific FPP for arbitrary n. As above, it is convenient to introduce a specific representation of the control state |x into qubits. More precisely, we use (n − 1) · log 2 n control qubits |c x k,i (c x k,i ∈ {0, 1}) where k = 1, 2, ..., n − 1 and i = 1, 2, ..., log 2 n . For convenience, we definê i := log 2 n . The state |x c is identified with where the bits c x k,i satisfy the equation While the motivation for this basis will become clearer below, we give the formal proof that every x ∈ {0, 1, ..., n! − 1} can be represented in this way in Appendix A. 8 Now, we will show that the quantum circuit given in Fig. 8 solves this specific FPP with O(n log n) queries. The control system consists of the (n − 1) · log 2 n control qubits |c x k,i introduced above.
The idea of the algorithm is, as usual, that the blackbox gates act on the target systems in a certain order and due to the pairwise commutation relations certain phases are picked up by rewriting the final state of each target system. More precisely, we will show that whenever a control qubit |c x k,i is in the state |1 , we obtain a relative phase of ω k 2 i ·k!·y , independent of the states of the other control qubits. For every x ∈ {0, 1, ..., n!−1} these phases multiply together to ω x·y : Here, we used the representation of x in the basis given 8 Note that the representation of x in this basis is not unique. For our purpose, it is enough to pick one such representation for every x. Furthermore, this representation is related to the factorial number system that is used to label the permutations Πx in Eq. (V.2). More precisely, c x k,i · k 2 i is a representation of a k into log 2 n bits.
in Eq. (V.12) to show that all the phases accumulate to: (V.14) One can observe that the target system becomes independent of the control system. Hence, if the control system is initialized in an equal superposition of all x, the circuit applies, by linearity, the transformation and all target systems factorize out at the end of the algorithm. After applying the inverse Fourier transform to the control system, the correct value of y can be read out with a measurement of the control system in the computational basis. For a better understanding of the algorithm, in addition to the example of n = 4 in the last subsection, we give the circuit for n = 8 in Appendix C.

How the algorithm works
To show that this algorithm realizes the desired transformation in Eq. (V.13), we focus on one target system |Ψ 2 i ,j and all the gates that act on it. These are exactly the gates U k that satisfy k ≡ j (mod 2 i ). The order in which these gates are applied on the target system |Ψ 2 i ,j depends on the states of the corresponding control qubits |c x k,i (see Fig. 9).

FIG. 9:
Each control qubit |c x k,i controls whether the gate U k (with k ≡ j (mod 2 i )) is applied either before or after an entire block of matrices with a smaller index. For example, if |c x 2 i +j,i = 1 the gate U 2 i +j is applied before U 0 and U j , while if |c x 2 i +j,i = 0 it is applied after these two gates. Rewriting the final state leads to a factor of ω 2·(2 i +j)!·y whenever |c x 2 i +j,i = 1 .
On the one hand, if all control qubits are in the state |0 , U 0 acts first, then U j and so on. This way, the final state of the target system becomes (...U 2·2 i +j U 2 i +j U j U 0 ) |Ψ 2 i ,j . On the other hand, if some of the control qubits are in the state |1 , the gates act in a different order. The structure of the algorithm is chosen such that a gate is either applied immediately before or after an entire block of unitaries with a smaller index. By rewriting the final state into the form (...U 2·2 i +j U 2 i +j U j U 0 ) |Ψ 2 i ,j , a unitary has to be commuted either with all unitaries within this block or with none of them. One can check that every gate U k that appears on the target system has to be commuted with k 2 i unitaries of smaller index if and only if |c x k,i = 1 , leading to an additional factor of ω k 2 i ·k!·y . This shows that the algorithm realizes the transformation given in Eq. (V.13).
(V. 17) We conclude that the query complexity of this algorithm is O(n log n).

Control of unknown unitaries
In this circuit, we control unknown unitaries. This operation is not well-defined within the standard quantum circuit model [30]. Nevertheless, one can circumvent this issue by introducing auxiliary systems. More precisely, for every k ∈ {1, 2, ..., n − 1}, we add an auxiliary system |a k initialized in an arbitrary d-dimensional state. Whenever an unknown unitary U k , controlled on |c x k,i , shall be applied on |Ψ 2 i ,k (mod 2 i ) , we perform instead a controlled swap of |Ψ 2 i ,k (mod 2 i ) and |a k : If the control qubit is in the state |c x k,i = 1 , the two systems are swapped and the gate U k is applied on the target system. On the other hand, if the control qubit is in the state |c x k,i = 0 , the two systems are not swapped and the gate U k is applied on the auxiliary system |a k instead. In the case where the gates are applied conditioned on |c x k,i = 0 (for the boxesṼ ), the swaps are also conditioned on |c x k,i = 0 (all black dots are replaced by white dots).
It is important to ensure that this replacement does not affect the functionality of our algorithm. To see that this is true, note that for every i ∈ {1, 2, ...,î}, the gate U k acts exactly once on the target system |Ψ 2 i ,k (mod 2 i ) and once on the auxiliary system |a k , independent of the state of the control qubit |c x k,i . (The control qubit controls only if it is first applied on the target system and thereafter on the auxiliary system or vice versa.) In total, each auxiliary system ends up in the state (U k )î |a k , independent of the state of the control system. In this way, the auxiliary systems factorize out when the control system is initialized in a superposition of all states x ∈ {0, 1, ..., n! − 1} and do not affect the outcome of the measurement of the control system at the very end of the algorithm.

C. O(n log n) causal algorithms for every FPP?
A natural question that appears is whether it is possible to solve other FPPs with O(n log n) queries as well. The algorithms that we presented in this section can only be used for this specific labeling of the permutations, since we explicitly use the relations U j U k = ω k!·y U k U j . If the permutations are labeled differently, the pairwise phases will change and the above algorithm cannot be used directly. Nevertheless, we think that the structure of our algorithm can be adopted to solve other FPPs as well. The idea that a certain phase ω φ(k,i)·y is picked up whenever a control qubit |c x k,i is in the state |1 (by using a structure as in Fig. 9) can be used for different pairwise commutation relations as well. If every x ∈ {0, 1, ..., n! − 1} can be written as for some bits c x k,i ∈ {0, 1}, we can use the control qubits |c x k,i as the control system |x c : By initializing the control system in an equal superposition of all x ∈ {0, 1, ..., n! − 1}, such an algorithm will apply the transformation where for every x, the phase ω x·y is obtained as the product of: Here, the equality follows from Eq. (V.18). Again, the solution y can be read out after applying the inverse Fourier transform to the control system.
Since every number x ∈ {0, 1, ..., n! − 1} can in principle be represented with O(n log n) bits (n! ≤ 2 n log 2 n ) and since two queries couple to every control qubit, it seems likely that every FPP can be solved with O(n log n) queries. The crucial point is whether it is possible to find an implementation as in Fig. 9, a combination of gates and target systems such that this can be done efficiently. The disadvantage of this procedure is that it requires some rather involved combinatorics and that one has to adapt this algorithm by hand. While it remains open whether this is always possible, we present in the next section an algorithm that can solve every FPP with O(n √ n) queries, independent of the labeling of the permutations.

VI. A CAUSAL ALGORITHM THAT SOLVES EVERY FPP WITH O(n √ n) QUERIES
In this section, we present an algorithm that solves every Fourier promise problem with O(n √ n) queries.
The main idea is based on the fact that the existence of pairwise commutation relations (U j U k = α jk U k U j ) allows us to rewrite every permutation Π x = U σx(n−1) ...U σx(1) U σx(0) into: where the total phase α x is a product of pairwise phases α jk . We use this fact to decompose the total phase of every permutation into different factors and simulate each factor on a different target system. So instead of simulating every permutation Π x on its own (which requires the simulation of the (full) n-switch and hence O(n 2 ) queries), we construct other expressions that can simulate these factors and call only O(n √ n) gates in total. Via the "phase-kickback," all these factors accumulate in the control system and multiply together to the total phase α x .
As an example, consider n = 9 and (VI.4) They are defined in a way that Π xk andΠ r xk simulate exactly all pairwise phases α jk for the unitaries within the block (U σx((k+1)·n−1) ...U σx(k·n) ). If they act on different target systems |Ψ k and |Φ k respectively, all the pairwise phases are accumulated and we obtain as the product the total phase of the original permutation Π x : Lemma 1. For every set of (d-dimensional) unitaries {U i } n−1 0 that satisfy pairwise commutation relations, the following relation holds: Here, Π := U n−1 ...U 1 U 0 denotes the descending order of all unitaries, Π r := U 0 U 1 ...U n−1 denotes the ascending order of all unitaries and |Ψ k , |Φ k ∈ H d are arbitrary d-dimensional states.
Proof. See Appendix B.
It turns out that the permutations Π xk andΠ r xk , due to the fact that a large part of each of them is already ordered, can be simulated with a causal algorithm and O(n √ n) queries. The algorithm that achieves this is presented in the next subsection.
In the algorithm, (2 ·k − 1) target systems, denoted as |Ψ k and |Φ k , as well as n auxiliary systems |a i are used. All of them are initialized in an arbitrary d-dimensional state. The control system is a system of at least n! dimensions and the algorithm applies, depending on the state |x of the control system, the permutations Π xk on |Ψ k and the permutationsΠ r xk on |Φ k . All the remaining gates U i act on the corresponding auxiliary system |a i . In this way, the algorithm realizes the transformation Here, k i =n + 2 ·k − 3 is a constant that only depends on n and Lemma 1 is used to rewrite the state in the second step. Except for the first target system |Ψ 0 , which ends up in the state Π x |Ψ 0 , the final state of each target and auxiliary system is independent of x and we conclude that this algorithm simulates the action of the n-switch for unitaries satisfying pairwise commutation relations and is therefore able to solve every Fourier promise problem. More precisely, as described in Subsection III B, with a Fourier transform, the control system is initialized in an equal superposition of all states x ∈ {0, 1, ..., n! − 1} and after the algorithm is applied, the solution y can be read out with a measurement in the Fourier basis. 9 Note that at no point we refer to a specific labeling of the permutations Π x and hence, we can solve every possible Fourier promise problem with this causal algorithm. The advan-tage stems merely from the fact that every set of unitaries that satisfies the promise also satisfies pairwise commutation relations which imply that Lemma 1 holds. Here we present the quantum circuit that realizes the transformation described in the last subsection (Eq. (VI.6)) and show that this algorithm uses O(n √ n) queries. To keep the procedure as clear as possible, we divide the quantum circuit into three parts.

Part 1
First, all target systems |Ψ k undergo the transformations For each |Ψ k , this is realized by the algorithm V Ψ k given in Fig. 11. Here, depending on the state |x , in each step i = 0, 1, ..., n − 1, the target system |Ψ k is swapped with |a i if and only if U i is contained in [U σx(k·n−1) ...U σx(0) ]. 9 For the comparison with Eq. (III.9), note that by using the promise Πx = ω x·y Π 0 , the term |x c ⊗ Πx |Ψ 0 in Eq. (VI.6) can be rewritten into ω x·y |x c ⊗ Π 0 |Ψ 0 .
To understand why this circuit realizes the above transformation, note that [U σx(k·n−1) ...U σx(0) ] is by construction a block of k ·n unitaries in descending order and the unitary with the smallest index has to be applied first. By going step by step through each of the n possible unitaries U 0 until U n−1 , exactly those unitaries contained in the ordered block are applied on the target system |Ψ k and all the others are applied on the corresponding auxiliary system. Each of these unitaries V Ψ k consumes n queries.
Similarly, the transformations are realized with the algorithm W Φ k given by the circuit in Fig. 12. Here, in each step, the target system |Φ k is swapped with |a i if and only if the gate U i is contained in {U σx(n−1) ...U σx(k·n) }.

FIG. 12: Implementation of W Φ k
The only difference is that the unitaries in this block are arranged in ascending order and the unitary with the highest index is applied first. Each of these W Φ k consumes again n queries.

Part 2
In the second part, we realize the transformations: with the algorithm in Fig. 13. In each step i = 0, 1, ...,n − 1, every |Ψ k is swapped with the auxiliary system |a σx(k·n+i) . In this way, U σx(k·n+i) acts on |Ψ k , and afterwards |Ψ k and |a σx(k·n+i) are swapped back. Aftern steps, the entire block (U σx((k+1)·n−1) ...U σx(k·n) ) is applied on each target system |Ψ k . (For k =k − 1 already stop after the step in which U σx(n−1) is applied on |Ψk −1 .) Note that the target systems |Φ k are unaffected by this part of the algorithm. In each of then steps, n queries are consumed. This part of the algorithm is similar to the algorithm presented in Subsection II B. The difference is that we swap several target systems simultaneously, instead of only one.

Query complexity
The number of queries in Part 1 amounts to 2 · (k − 1) · n. For Part 2, we needn · n queries, while for the last Part, 2 · (k − 1) · n black-box gates are called. Summing these together gives Q = (n + 4 ·k − 4) · n (VI.12) queries in total. Usingn := √ n < √ n + 1 and k := n n < n n + 1 ≤ n √ n + 1 = √ n + 1 we obtain: It is important to ensure that the auxiliary systems factorize out at the end of the algorithm. To see that this is indeed true, we observe that in each part of the algorithm a gate U i acts either on a target system |Ψ k , a target system |Φ k or the auxiliary system |a i . All together, for every i ∈ {0, 1, ..., n − 1}, the gate U i appears exactlyn + 4 ·k − 4 times in the algorithm. Furthermore, it acts, independently of the permutation Π x , on each of thek target systems |Ψ k and each of thek − 1 target systems |Φ k exactly once. This is true since the expressions Π xk andΠ xk are by themselves permutations of the n unitaries and contain each U i exactly once. In all other remaining instances, U i acts on the auxiliary system |a i , which therefore ends up in the state (U i ) ki |a i with k i =n + 2 ·k − 3, independent of the state of the control system |x . In total, we have shown that the algorithm realizes the desired transformation given in Eq. (VI.6) for every x ∈ {0, 1, ..., n! − 1}: (VI.14) Hence, we conclude that this algorithm solves every Fourier promise problem with O(n √ n) queries.

VII. CONCLUSION
The introduction of indefinite causal structures raised the question of the existence of computational tasks which can be solved more efficiently using these structures, compared to causally ordered protocols. Fourier promise problems were initially introduced to demonstrate that such a computational advantage exists even in the asymptotic case. The problems were shown to be solved with n queries using the quantum-n-switch and it was expected that the most efficient solution with a causal protocol requires the simulation of the quantum-n-switch and, hence, O(n 2 ) queries. We showed that for the specific task of solving Fourier promise problems, the advantage of using the quantum-n-switch is significantly smaller than previously expected; in fact, we presented a causal quantum algorithm, within the standard quantum circuit model, which solves the same computational tasks almost as efficiently. More precisely, we presented a causal algorithm that solves a specific Fourier promise problem with O(n log n) queries and conjectured that all problems of this class can be solved with a similar efficiency. Furthermore, we presented a causal algorithm that solves every Fourier promise problem with O(n √ n) queries.
We conclude that for the specific class of problems considered here, the advantage of algorithms that use a quantum-controlled ordering of gates, compared to causally ordered algorithms, is smaller than first expected. Nevertheless, although we could show that the simulation of the quantum-n-switch is not the most efficient causal algorithm for solving FPPs, it is in principle possible to construct tasks which profit more from using the quantum-n-switch. One promising class of problems is already introduced in Ref. [22]. The so-called Hadamard promise problems are a variation of the Fourier promise problems with the advantage that they are better suited for experimental realization. In Ref. [22], the authors present only one specific task with four unitaries without providing a generalization to an arbitrary number of gates. In this sense, it needs further investigation whether there is a significant asymptotic scaling advantage for Hadamard promise problems or whether the methods we developed in this work also apply to these problems. All in all, this raises the important challenge of finding computational tasks for which the quantum-n-switch and indefinite causal structures in general provide a significant advantage.
Proof. We will prove this by expressing both sides in terms of pairwise phases and comparing them at the end. Recall that the pairwise phase α jk is defined via U j U k = α jk U k U j and from comparing this with U k U j = α kj U j U k , we obtain α jk = (α kj ) −1 .
The total phase of the permutation Π x is the product of the following pairwise phases α ij : (B.4) To obtain this expression, we can compare every pair of unitaries and check whether they have the same order as in Π. If not, the corresponding phase appears in the product. To be more precise, we can start with j = 0 (with the matrix U σ(0) ) and check for every i > j = 0 if σ x (i) < σ x (j = 0). If so, then the order of U σx(0) and U σx(i) is reversed between Π x and Π and the corresponding phase appears as a factor in the total phase of Π x with respect to Π. By repeating this procedure for every n − 1 ≥ j ≥ 0, we obtain the above expression. For example, for n = 4 and the permutation Π x = U 1 U 2 U 0 U 3 we obtain: For the left hand side of the statement, we can compute for every Π xk the relative phase to Π and for everyΠ r xk the relative phase to Π r in terms of the pairwise phases α ij . One could calculate these phases in a direct way. Here we will follow this approach but with some shortcuts; in a first step, we rewrite for every k ∈ {0, 1, ..,k − 2} the first two blocks of Π xk , namely [U σx(n−1) ...U σx((k+1)·n) ] (U σx((k+1)·n−1) ...U σx(k·n) ), into: [U σx(n−1) ...U σx((k+1)·n) ] (U σx((k+1)·n−1) ...U σx(k·n) ) = To see that this is true, remember that we obtain the phase as a product of the pairwise phases by comparing each pair of positions n − 1 ≥ i > j ≥ k ·n. But since the left block [U σx(n−1) ...U σx((k+1)·n) ] is already ordered, we do not have to consider the cases for which n − 1 ≥ i > j ≥ (k + 1) ·n. Similarly for k =k − 1, we rewrite: (U σx(n−1) ...U σx((k−1)·n) ) = n−1≥j≥(k−1)·n and n−1≥i>j with σx(i)<σx(j) α σx(i)σx(j) · [U σx(n−1) ...U σx((k−1)·n) ] . (B.8) To summarize, we can rewrite the permutations Π xk into: In this way, we already obtain all required phases. For the above example of Π x = U 1 U 2 U 0 U 3 , these expressions read: Note that for k = 0, the permutationΠ x0 = [U σx(n−1) ...U σx(0) ] always equals Π. On the other hand, the permutations Π xk (for k ≥ 1) would lead in general to additional (unnecessary) phases relative to Π but they are exactly compensated by the phases ofΠ r xk relative to Π r . To see this, note thatΠ xk andΠ r xk are (by construction) the same permutations but in reversed order. This has the property that the relative phase betweenΠ xk and Π is exactly the inverse of the relative phase betweenΠ r xk and Π r . This is true since, whenever two unitaries U i and U j are commuted inΠ xk relative to Π (and we obtain α ij as a factor in the relative phase betweenΠ xk and Π), then the two unitaries are also commuted inΠ r xk relative to Π r (and we obtain the phase α ji = (α ij ) −1 as a factor in the relative phase betweenΠ r xk and Π r ). For the example of n = 4 and Π x = U 1 U 2 U 0 U 3 this becomes: (B.14) In this way, we obtain for k = 0 :Π x0 = Π , and for every k ∈ {1, 2, ...,k − 1} :Π xk |Ψ k ⊗Π r xk |Φ k = Π |Ψ k ⊗ Π r |Φ k . (B.15) Putting everything together, we obtain the desired equation: