An Amplitude-Based Implementation of the Unit Step Function on a Quantum Computer

Modelling non-linear activation functions on quantum computers is vital for quantum neurons employed in fully quantum neural networks, however, remains a challenging task. We introduce an amplitude-based implementation for approximating non-linearity in the form of the unit step function on a quantum computer. Our approach expands upon repeat-until-success protocols, suggesting a modification that requires a single measurement only. We describe two distinct circuit types which receive their input either directly from a classical computer, or as a quantum state when embedded in a more advanced quantum algorithm. All quantum circuits are theoretically evaluated using numerical simulation and executed on Noisy Intermediate-Scale Quantum hardware. We demonstrate that reliable experimental data with high precision can be obtained from our quantum circuits involving up to 8 qubits, and up to 25 CX-gate applications, enabled by state-of-the-art hardware-optimization techniques and measurement error mitigation.


INTRODUCTION
Identifying quantum algorithms that allow quantum speedups for machine learning is a research area of rising interest [1].The integration of artificial neural networks with quantum computation is typically referred to as quantum neural networks [2][3][4][5][6].A plethora of different construction strategies for quantum neural networks has been reported over the last decade, a comprehensive review [7] of which however is far beyond the scope of this article.A rough categorization can be made between hybrid approaches that rely on variational quantum algorithms [8][9][10][11][12], and fully quantum implementations.For the latter, a key ingredient is the development of a quantum version for the Rosenblatt perceptron, involving the calculation of a tensor product, based on which subsequently a non-linear activation occurs [13].While tensor calculus has been performed on quantum computers [14][15][16], the implementation of non-linear activation functions remains a challenging task [17][18][19][20][21][22][23][24][25].
In principle, non-linearities can be modeled using the Unit Step Function (USF), returning zero for negative function inputs, and one otherwise.The USF is also known as the Heaviside (step) function or indicator function [26], originally developed in operational calculus, or the positive part of a function, commonly used in Fourier analysis [27] and finance [28].Recently, different approaches to implement non-linearity on a quantum computer have been reported, involving phase encoding and inverse Quantum Fourier Transform [20], Taylor expansion [25], as well as bit-wise comparison [29][30][31][32][33].Moreover, it has been known [17][18][19] that specifically the USF can in theory be implemented on quantum computers using repeat-until-success protocols [34][35][36][37][38].However, to the best of our knowledge, a corresponding implementation has not been reported yet.
In this article, we suggest a novel approach that modifies the repeat-until-success gearbox circuit [34,37] to avoid midcircuit measurements, allowing to encode an arbitrarily close approximation of the USF on the amplitude of qubits.Specifically, we construct a quantum circuit performing a unitary * Correspondence: Jonas Koppe (jonas.koppe@itwm.fraunhofer.de)transformation that, based on a given input, prepares a quantum register to represent the corresponding output of the USF.It is generally distinguished between passing the input to the circuit either (A) directly as a floating-point value from a classical computer, referred to as classical input, or (B) via another quantum state, referred to as quantum-state input.For (A), we first demonstrate the basic implementation for the USF, and subsequently augment the corresponding circuits for computing other non-linear activation functions, e.g., the Rectified Linear Unit (ReLU) [39].For (B), we analyze quantum-state inputs representing a single input angle.Furthermore, we extend our analysis to passing on a quantum state in superposition to the suggested circuit, i.e., for computing the average value of the USF over a set of inputs.
The performance of all quantum circuits shown in this article is not only evaluated by numerical simulation assuming an ideal, noise-free quantum computer, but also tested on the IBM Quantum device in Ehningen, Germany.For implementing quantum circuits on Noisy Intermediate-Scale Quantum (NISQ) hardware [40], we found the number of employed CX gates to be the primary criterion for obtaining reliable results.Therefore, in the following we will restrict the characterization of quantum circuits on NISQ devices to the number of required CX-gate applications.

THEORETICAL CONSIDERATIONS
In order to synthesize an arbitrarily small rotation, Wiebe et al. proposed the so-called gearbox circuit C(φ φ φ) demonstrated in Figure 1 [34,37].The name stems from the fact that the k input angles φ φ φ = φ 1 , . . ., φ k applied to the k control qubits c 1 , . . ., c k are used to produce a rotation involving a much finer rotation angle θ applied to the target qubit, where sin 2 (θ) = sin 2 (φ 1 ) • • • sin 2 (φ k ).The entire transformation can formally be written as where we have used the indices c and t to indicate the control (qubit) register and target qubit, respectively.|∅ ⊗k c is a FIG. 1.A version of the gearbox circuit C(φ φ φ) suggested by Wiebe et al [34,37].
compact notation representing the sum over all possible states of the control register except for the all-zero state |0 ⊗k c .Notably, a rotation involving θ, i.e., the transformation |0 t → e −i arctan[tan 2 θ]X |0 t , is successfully applied to the target qubit only if the control register is in |0 ⊗k c .The corresponding success probability is given by ρ 2 (θ) sin 4 (θ) + cos 4 (θ). ( Otherwise C(φ φ φ) transforms the target qubit according to |0 t → e −i π 4 X |0 t .Without exactly determining the state of the target qubit after circuit execution, measuring the control register consequently allows to ascertain which transformation has been performed on the target qubit.Repeating the circuit C(φ φ φ) until the state |0 ⊗k c is measured in turn ensures that the desired transformation |0 t → e −i arctan[tan 2 θ]X |0 t has occurred.This is known as the repeat-until-success principle [17,[35][36][37][38].Moreover, Wiebe et al. have theoretically shown that a nested version of the gearbox circuit can be built up recursively, i.e., the transformation on the target qubit from the gearbox circuit of Figure 1 serves itself as the input of an outer gearbox circuit as shown in Figure 3b.The transformation performed on the outermost target qubit of a dtimes nested gearbox circuit given all control qubits are found in |0 c is then given as |0 t → e −i arctan tan 2 d θ X |0 t [34,37].It is emphasized that generally the success probability ρ 2 (θ) depends on d, as detailed in the Supplementary Information (Section SI 1).However, in the following only the case for d = 1 as given in Eq. ( 2) will be relevant.

RESULTS
For implementing an approximation of the USF we use the square-wave property of the gearbox [37].For our purpose, the most basic gearbox version is sufficient, involving only a single control qubit, and likewise a single input angle φ 1 = θ, as shown in Figure 3a.In the following, this is referred to as a single-step gearbox Gb(θ) (also cf. Figure 3e).The FIG. 2. USF and its approximations S •d (θ) (see Eq. ( 4)) for different levels d = 1, 2, and 3 on the input space used in this article.transformation from Eq. ( 1) can then be expressed as Assuming the control qubit to be in the zero state |0 c , the probability of finding the target qubit in |1 t is given by S •1 (θ) sin 2 arctan[tan 2 θ] ; the corresponding trajectory is demonstrated in Figure 2. Clearly S •1 (θ) is reminiscent of a sigmoid function [41] for θ ∈ [0, π/2], and thus an appropriate approximation for the USF.Extending this observation to a d-nested gearbox circuit, in the following referred to as a d-step gearbox, the probability of finding the target qubit in |1 t , provided all control qubits have prior been found in |0 c , is accordingly given by Trajectories for d = 2 and 3 are likewise shown in Figure 2.Even though it is theoretically known that in such a way an approximation of the USF is encoded in the amplitude of the target qubit [17,37], to the best of our knowledge, an exact protocol for exploiting this fact has not been reported yet.Over the course of this article, we will introduce two alternatives for retrieving S •d (θ) from the quantum states produced by the dstep gearbox circuits.The application of both is demonstrated for the two scenarios described above, i.e., (A) a classical input, where the input for the gearbox circuit is a single angle θ ∈ [0, π/2] directly passed on by a classical computer, based on which the target qubit is ideally transformed according to as demonstrated in Figure 2 (USF).For the second scenario (B), the task is extended to passing a quantum-state input to the gearbox circuit, where each eigenstate in the measurement basis represents a different input angle θ j ∈ [0, π/2] used for the transformation in Eq. (5).Numbers of the control qubits are given as subscripts to distinguish them from the control qubits shown in Figure 1.
Basic USF Version.Consider the single-step gearbox circuit from Figure 3a, where we wish the measurement to reflect S •1 (θ).Evaluation of Eq. ( 3) however shows that measuring the probability for the state |1 t |0 c yields ρ 2 (θ)S •1 (θ).It is indeed possible to remove the distortion ρ 2 (θ) by combining the probabilities measured for both states involving |0 c : the probability of measuring the target qubit in state |1 t under the condition that the control qubit is found in state |0 c is given by where |τ Gb represents the state of both gearbox qubits before initiating the measurement.Unfortunately, this evaluation cannot simply be extended to d-step gearbox circuits and thus, to higher levels of approximation d for the USF, as they involve repeat-until-success protocols that rely on mid-circuit measurements [17,[34][35][36][37][38] (cf. Figure 3b).We propose an alternative implementation by allocating each gearbox element within a nested circuit to a new set of qubits, as demonstrated in the circuit modification in Figure 3c for the double-step gearbox.In Figure 3d, the corresponding extension of the triple-step gearbox is shown.This implementation principally requires 2 d measurable qubits, and the application of 2 d −1 CX gates.The corresponding generalization of Eq. ( 6) is given by For a more thorough mathematical treatment, the reader is referred to the Supplementary Information (Section SI 1).
To demonstrate the validity of this approach, we performed simulations of the quantum circuits shown in Figure 3c and d with their respective analytical form, confirming Eq. ( 7).In order to assess the applicability of the implementation on NISQ devices, both quantum circuits were optimized for the 27-qubit IBM Quantum system in Ehningen, Germany (IBMQ-27 Ehningen).The hardware-optimization procedure is detailed in the Methods section.It should be emphasized here that due to the restricted connectivity based on the heavyhexagon qubit architecture, state-swapping was required for the triple-step gearbox circuit, resulting in an overhead of 4 CX gates, and thus, in 11 CX-gate applications in total.The results are likewise shown in Figure 4 (blue stars).Again, 10 5 circuit executions were performed for each input angle θ.In general, the data reflects the simulated behavior with high precision.Yet deviations do occur for input angles at about the step θ ≈ π/4, where the experimentally obtained values systematically remain below the numerical prediction.
Even though this effect becomes more pronounced for the triple-step gearbox, it does not qualitatively affect the approximation of the USF.
Variations.We emphasize that the evaluation strategy described above for retrieving S •d (θ) can be incorporated into more advanced quantum circuits.For instance, consider a unitary U requiring as an input the (approximated) USF S •d (θ), which performs a transformation on an additional input state |Φ , and maps the result to an output qubit o.A schematic representation of this type of circuit involving the double-step gearbox is suggested in Figure 5a.It is important to highlight that only the state of the target qubit t must be passed to U. Eventually, data evaluation is done analogously to Eq. ( 7), however replacing the target qubit t with the output qubit o, i.e., Ω 1 0 As a primitive example for U requiring no additional input |Φ , assume we wish to modify the transformation given in Eq. ( 5) and instead encode on the output qubit o.This effectively allows shifting the first plateau from 0 to sin 2 κ.In Figure 5b we demonstrate a quantum circuit approximating this transformation using a double-step gearbox, such that we expect to observe the function S•2 (θ) sin 2 (θ)κ 1 − S •2 (θ) + S •2 (θ).Another important transformation that can be generated from the USF is the so-called ReLU activation function [39], commonly defined as ReLU(x) max(0, x).Here, the output of the USF is multiplied with the identity f (x) = x.Note that the unit step must necessarily coincide with the root of f (x).For the implementation on a quantum computer this requires synchronization of the gearbox input θ and the function input x.A variety of strategies could be used for this purpose.For simplicity, here we rigorously encode f (x) on a single input qubit q 1 using an R y (x) gate with where we make use of the fact that for all x ≤ π/4 the corresponding gearbox output S •d (θ) evaluates to zero.We thereby guarantee that f (x (θ)) ∈ [0, 1/2] and f (x (π/4)) = 0.The entire ReLU transformation, formally written as is approximated by the quantum circuit demonstrated in Figure 5d, employing a double-step gearbox.The expected output is then given by C ).Using identical conditions as described in Figure 4, we tested the performance of the quantum circuit from Figure 5b for κ = arcsin √ 1/4, i.e., expecting a first plateau at 1/4, and the quantum circuit from Figure 5d on an ideal, noise-free quantum computer as well as on IBMQ-27 Ehningen.The corresponding analytical, simulated and experimental data is demonstrated in Figure 5c and e, respectively.As observed before, the simulated but also experimental data is in excellent agreement with the analytically expected behavior.
The results discussed in the previous section generally demonstrate the ability to approximate the USF on a quantum computer.However, directly passing an input angle θ to the gearbox circuit requires communication with a classical computer.For many future applications we rather assume gearbox circuits to receive a quantum state |Ψ as an input, with each eigenstate of the measurement basis |ψ j representing a different input angle θ j .In this section, we therefore demonstrate results for a single-step gearbox capable of considering four arbitrarily chosen input angles θ θ θ/π (0.15, 0.2, 0.4, 0.45) T , where each entry is denoted as θ j /π, j = 0, . . ., 3.These angles are represented by a two-qubit state register s, i.e., by the amplitudes of |ψ j ∈ {|00 , |01 , |10 , |11 }, respectively.An efficient approach for passing input angles to the gearbox circuit based on the state of s are uniformly-controlled rotations [42,43].This generally requires 2 p CX-gate applications, where p indicates the number of qubits in the state register.The corresponding circuit element for a two-qubit state register is shown in Figure 6a.Note that the angles ϑ j involved in the depicted sequence of rotations can be obtained from the θ j , the exact conversion is detailed in [42].The incorporation of uniformly-controlled rotations into the single-step gearbox, as demonstrated in Figure 6b, generally requires 9 CX gates.
Single Angle.At first, we consider a situation where the state-qubit register is in one of the eigenstates of the measurement basis, i.e, |Ψ = |ψ j , such that only one of the four angles θ j is passed to the gearbox circuit.In fact, this is not fundamentally different from the case of a classical input discussed in the previous section; data evaluation can be performed according to Eq. ( 6) to obtain S •1 (θ j ).In Figure 7  (a) Uniformly-controlled rotations involving two state qubits.For eventually implementing the correct rotation according to θ j /π ∈ {0.15, 0.2, 0.4, 0.45}, angles must be converted as described in [42].The corresponding angles here are given by ϑ j /π ∈ {0.ideal, noise-free quantum computer are represented by the green circles.Additionally, the respective states of the state register are indicated.Clearly, analytical and simulated data is in excellent agreement.For assessing its performance on a NISQ device, the quantum circuit from Figure 6b was again optimized for IBMQ-27 Ehningen (resulting in 9 CX-gate applications with no overhead), and executed 10 5 times for each input state.Each of these runs was then repeated 100 times, from which the resulting average values for Ω(1 | 0) are indicated by the blue stars in Figure 7. Notably, the experimental data does not achieve the level of precision we expect from the results demonstrated in Figure 4: while for θ j < π/4 estimators for Ω(1 | 0) are found slightly above the expected values, for θ j > π/4 results remain below the analytical and numerical prediction.We assign this to the diminished precision of θ j passed to the gearbox.Here θ j is generated by the uniformly-controlled rotations, involving 4 CX-gates and several single-qubit-gate applications, and is thus significantly more prone to error when compared to the classical input θ from the previous section.Since the individual experimental data (100 hardware runs not shown here for clarity) have sufficiently small confidence intervals (3σ levels lie within the size of the symbol), this error seems to be rather systematic than random.Nevertheless, the overall behavior according to S •1 (θ) is generally well reflected by the experimental results.

Multiple Angles.
A more complex situation occurs when the state register s is no longer in one of the four eigenstates |ψ j .Here we are particularly concerned with the state qubits in an equal superposition of all four eigenstates, |Ψ = |+ |+ = 1/2 3 j=0 |ψ j , i.e., the average output of the gearbox.Unfortunately, the success probability ρ 2 (θ j ) is state-dependent, such that S •1 (θ θ θ) 1/4 3 j=0 S •1 (θ j ) cannot simply be retrieved by using the evaluation strategy suggested  4).Simulated data, illustrated as green circles, was obtained by executing the quantum circuits shown in Figure 6b 10 5 times assuming an ideal, noise-free quantum computer with all-to-all connectivity.For the experimental data, the circuit has been optimized for IBMQ-27 Ehningen.100 experimental hardware runs were performed, each likewise comprising 10 5 circuit executions.The blue stars represent the respective averages.The optimum quantum circuit for |00 on IBMQ-27 Ehningen is shown in Figure S5a. in Eq. ( 6).Instead, this yields where j is the integer representation of the two-bit string in the computational basis.For the chosen input angles θ θ θ, Eq. ( 12) evaluates to Ω(1 | 0) ≈ 0.644, while we wish to find S •1 (θ θ θ) ≈ 0.567.To confirm this bias, we have repeated the procedure described for Figure 7 with the state register given by |Ψ = |+ |+ .Analytical, simulated, and experimental data is demonstrated in the left panel of Figure 8a.Here, we additionally showed the individual 100 runs as small gray circles.Blue circles and error bars indicate the average and 3σ levels, respectively.The inset indicated by the red box is shown in Figure 8b.For clarity we shifted analytical and simulated data against the experimental data.Generally, these results are in agreement with Eq. ( 12).The experimental estimator for Ω(1 | 0) remains slightly below the predicted value, which however is in accordance with the observations from Figure 7.
It should be reiterated here, that following Eq.( 6), for the single-step gearbox the probability of measuring |0 c |1 t with a state register in equal superposition is given by with |τ Gb representing the state for the two involved gearbox qubits.Aiming to obtain S •1 (θ θ θ) from this result, a transformation must be found performing a state-wise amplitude encoding for ρ −1 (θ j ), which is then appended to the gearbox circuit.Then, a measurement yields where D(θ j ) ρ −2 (θ).Since ρ −1 (θ j ) is a periodic function (cf.Eq. ( 2)), a corresponding transformation may be approximated using its Fourier series, a strategy which has been suggested for similar purposes before [44].However, due to the structure of the problem, here we chose a different approach than previously reported.Each term from the series expansion is encoded on a different qubit.Eventually, the full transformation can be constructed by adding up all of these terms using amplitude addition [31].Note that this approach requires an approximation of D(θ j ) instead of ρ −1 (θ j ), and likewise the consideration of probabilities instead of amplitudes in the Fourier series expansion, where the latter can be accounted for by employing squared sinusoidal functions.Using simple trigonometric identities, the first four terms of the corresponding approximation are given by D(θ j ) 2 ≈ 0.915 − 0.485 cos 2 (2θ j ) + 0.083 cos 2 (4θ j ) − 0.014 cos 2 (6θ j ). ( The factor 1/2 must be included to ensure normalization.In principle it is possible to construct a quantum circuit that entirely computes Eq. ( 15) on the amplitude of a single qubit, and multiply it to the gearbox output according to Eq. ( 14), i.e., After multiplying the result of the measurement for ∆ with the factor of 2, indeed S •1 (θ θ θ) is obtained.The construction of the quantum circuit ∆ including amplitude subtraction is detailed in the Supplementary Information (see Section SI 4 and following).
The implementation of ∆ theoretically requires 238 CXgate applications and is far beyond what we expect to be reasonably implementable on state-of-the-art NISQ hardware.Therefore, in the remaining part of this section we present a strategy for obtaining reliable, experimental results for ∆ on IBMQ-27 Ehningen regardless.The decisive advantage from our approach for implementing the Fourier Series expansion of D(θ j ) can be demonstrated by rewriting Eq. (16) as where D 2k (θ j ) = a 2k cos 2 (2kθ j ) and a 2k stemming from Eq. ( 15).Accordingly, instead of performing the full transformation by a single, complex circuit ∆, we are able to split ∆ into four subcircuits ∆ 2k , implementing the D 2k (θ j ) elements separately.After execution, the individual results are then post-processed as ∆ = ∆ 0 − ∆ 2 + ∆ 4 − ∆ 6 to eventually obtain S •1 (θ θ θ).Each subcircuit ∆ 2k begins with the single-step gearbox as shown in Figure 6b with the state register represented by |Ψ = |+ |+ .Since subcircuit ∆ 0 constitutes a specifically simple case, which only includes rescaling the gearbox result by a 0 (cf.Eq. ( 15) and Eq. ( 17)), its discussion is postponed for the moment.The ∆ λ , λ = 2, 4, 6 subcircuit family is more elaborate: an additional uniformly-controlled rotation is required to encode the respective term cos 2 (λθ j ), which is subsequently rescaled by a λ , and finally multiplied to the gearbox result.The corresponding quantum circuit is demonstrated in the top row of Figure 9a.In the lower row, the corresponding topological requirement for a quantum computer is schematically shown using color-codes and lines for indicating direct  I.
communication between the involved qubits.Assuming allto-all connectivity, a ∆ λ subcircuit requires 25 CX gates.Considering the typical heavy-hexagon architecture of IBMQ devices, even the most efficient hardware realization found for ∆ λ involves 44 CX-gate applications, and thus still remains too technically demanding.The last resource available for further simplifying the ∆ 2k subcircuits remains in externalizing the multiplication by a λ to a classical computer.The corresponding quantum circuit, shown in the upper panel of Figure 9b, solely rescales the gearbox result with the respective cos 2 (λθ j ) term, and stores the value in a qubit t∆ λ .The final result is then obtained post-measurement by classically computing ∆ λ = a λ • t∆ λ .This modification in turn allows further simplifying the last multiplication step implemented by the Toffoli gate T [45] (green-colored box in Figure 9): even the most efficient implementation requires 6 CX-gate applications and all-to-all connectivity of the three involved qubits [46,47].However, a phase-equivalent version T has been reported, performing the identical transformation as T , but additionally flagging the |101 state (referring to the three involved qubits) with a negative amplitude (phase shift by π) [48,49].This version can be implemented using only 3 CX gates, and additionally does not require communication between the two control qubits.Both hardware implementations for the Toffoli gate and the phase equivalent version can be found in the Supplementary Information (Figure S6).Since T in Figure 9b is immediately followed by a measurement, T can be employed regardless.This is demonstrated in Figure 9c, and ultimately allowed us to reduce the number of CX gates to 16 assuming all-to-all connectivity, and to 25 considering the architecture of IBMQ-27 Ehningen.All requirements for the quantum circuits from Figure 9 are summarized in Table I.
Note that multiplying a λ post-measurement can likewise be employed for ∆ 0 , leaving the circuit at the same complexity as described in Figure 7 (9 CX-gate applications, no overhead).In Figure 8a, right panel, the analytical value S •1 (θ θ θ) ≈ 0.567 is compared to simulated and experimental estimators for Ω(10) according to Eq. ( 14).Data for Ω(10) was obtained by performing runs of the four subcircuits ∆ λ , λ = 0, 2, 4, 6 in their most simplified version on an ideal, noise-free quantum computer assuming all-to-all connectivity, and on the IBMQ-27 Ehningen device using the optimum hardware realizations.The inset indicated by the red box is again given in Figure 8b.The experimental procedure is identical to the one reported for Ω(1 | 0), shown in the left panel.Notably, the simulated result for Ω(10) (green diamond) is slightly larger than the analytical value (black cross).This is in accordance with stopping the series expansion of D(θ j ) after a negative term, leaving the approximation below the exact value.Regarding the 100 individual experimental results for Ω(10) (small gray circles), a wider scattering can be observed when compared to the experimental data for Ω(1 | 0) (left panel); the standard deviation (blue error bar) was found about three times larger, which is expected considering the significant increase in CX-gate applications.Nevertheless, the overall estimator for Ω(10) (blue circles) is in good agreement with the analytical and simulated values, and is clearly distinguishable from the experimental data obtained for Ω(1 | 0).

DISCUSSION
In this article, we have introduced an amplitude-based encoding for an approximation of the USF.This involved a quantum circuit that, based on a given input, prepares a quantum register to reflect the corresponding USF output.Two circuit variations were suggested receiving the input either (A) directly from a classical computer, or (B) via a quantum state when incorporated into a larger circuit.For (A), we have demonstrated different levels of approximation, and furthermore showed small circuit extension allowing to approximate other non-linear functions.For (B), we likewise presented a circuit extension for quantum-state inputs in superposition, e.g., when computing the average output of the USF.Supported by analytical and simulated data, the performance for all quantum circuits was evaluated on the IBM Quantum device in Ehningen, Germany.Reliable experimental results were presented, obtained from quantum circuits that included up to 8 qubits, and up to 25 CX-gate applications, clearly demonstrating the applicability of our approach on state-ofthe-art Noisy Intermediate-Scale Quantum devices available to date.

METHODS
The python code for constructing, optimizing, and executing the quantum circuits discussed in this article was written using the Qiskit framework [50].

Simulation
All simulations have been performed using the AerSimulator included in Qiskit.Throughout, an ideal, noise-free quantum computer with all-to-all connectivity was assumed.We would like to emphasize that the number of circuit executions for each input angle or state has been set to 10 5 for maintaining comparability to the experimental data.However, sufficiently small confidence intervals were already observed for a significantly lower amount of repetitions, ranging between 2 10 to 2 14 .

Hardware-Optimization Procedure
Optimizing the quantum circuits demonstrated in this article on IBMQ-27 Ehningen requires transpilation prior to execution.Within Qiskit terminology, transpilation can be understood as a pipeline consisting of four steps: (1) rewriting the circuit in terms of the basis gate library of the backend, (2) mapping the virtual qubits to the physical qubits of the device (initial layout), (3) incorporating swap operations necessary due to the restricted connectivity of the involved physical qubits, and (4) optimizing the employed gates.Throughout we have chosen transpilation using the highest diligence (optimization level 3), that relies on the SWAP-based Bidi-REctional heuristic search algorithm (SABRE) for finding the optimum initial layout and swapping strategy [51], and additionally perform full single-and double-qubit-gate optimizations.Due to its stochastic nature, we have repeated transpilation 50 times, and saved the resulting transpiled circuit with the lowest number of CX gates.Note that the corresponding virtual-to-physical qubit mapping is optimum in terms of CX-gate applications, but does not consider the quality of the involved qubits.Since we only use 8 of the 27 qubits available on IBMQ-27 Ehningen at most, typically this transpiled circuit can be reconstructed using different qubit subsets.The mapomatic package [52] allows to evaluate all of these different subsets regarding their individual error rates.
In such a way we eventually identified the optimum hardwarerealization of the circuit with respect to the CX-gate applications and the quality of the involved qubits.It is emphasized that the full procedure is only required once for a distinct circuit and does not need to be repeated when changing an input angle or state.Post-measurement, error mitigation using the matrix-free measurement mitigation (M3) package was applied throughout [53].Every circuit was executed 10 5 times, corresponding to the current maximum number of repetitions on the IBMQ-27 Ehningen.It should be noted that all experiments were conducted shortly after (typically less than 30 minutes) calibration of the device.

Double-
Step Gearbox with rescaled first plateau ReLU Using Double-Step Gearbox In this section, we present a method to compute any function depending on some angle θ by implementing its approximation using Fourier series expansion.The previous step is not necessarily needed, but might enable future optimization techniques like possibly using the same qubit for both cos 2 (•) terms, where the phase shift by π/4 could be cleverly constructed using single-qubit gates.
Using the double angle formula cos(2x) = 2 cos 2 (x) − 1, we can rewrite the above equation as This Fourier approximation can be implemented as a combination of R y gates using amplitude addition [8] and subtraction, as described in Section SI 4. The corresponding circuit is shown in Figure S8, the combination with a single-step gearbox circuit receiving a quantum-state input is presented in Figure S9.Due to its complexity, we may decrease our circuit size using the splitting method described in the main text to reliably implement it on NISQ devices [9].Note that in main text we have allocated each Fourier term to its own circuit, there exist alternative strategies, e.g., the separation into a positive and negative part (∆(x) = ∆ + (x) − ∆ − (x), ∆ + , ∆ − ≥ 0) to avoid the amplitude subtraction procedure.The corresponding circuit can be found in Figure S10.

FIG. 3 .
FIG. 3. Quantum circuits implementing the USF for different levels of approximation d.(a) Single-step gearbox with d = 1, (b) double-step gearbox with d = 2 according to repeat-until-success protocols [34], (c) double-step gearbox with d = 2, and (d) triple-step gearbox with d = 3 as suggested in this article.(e) Definition of the elementary gearbox-circuit element used in (b), (c), and (d).Numbers of the control qubits are given as subscripts to distinguish them from the control qubits shown in Figure 1.
FIG. 4. Results for the implementation of the USF at approximation levels d = 2 and 3 in (a) and (b), respectively.Analytical curves are based on Eq. (4).Simulated data (green circles) was obtained by executing the quantum circuits shown in Figure3(c) and (d).For the experimental data (blue stars), the respective circuits have been optimized and executed on IBMQ-27 Ehningen.Optimum quantum circuits are demonstrated in FigureS4.For the simulated and experimental circuit runs, the input space θ ∈ [0, π/2] has been covered by 101 increments θ j = jπ/200, j = 0, . . ., 100, where for each data point 10 5 circuit executions were conducted.

FIG. 5 .
FIG. 5. (a) Schematic representation for combining an arbitrary unitary transformation U with the double-step gearbox from Figure 3c.(b)/(d) Quantum circuits approximating the transformations given in Eq. (9) and Eq.(11), respectively.The black boxes indicate the additional unitary transformations U as generally introduced in (a).(c)/(e) Analytical, simulated, and experimental data for the quantum circuits from (b) and (d), respectively.All conditions are identical to those reported for Figure 4. Optimum quantum circuits on IBMQ-27 Ehningen are shown in Figures S2 and S3.

FIG. 7 .
FIG. 7. Results for the quantum-state input passed to the single-step gearbox as shown in Figure 6b.The state register is represented by |Ψ ∈ {|00 , |01 , |10 , |11 } as indicated, corresponding to the input angles θ j /π ∈ {0.15, 0.2, 0.4, 0.45}.The curve for S •1 (θ) is based on Eq. (4).Simulated data, illustrated as green circles, was obtained by executing the quantum circuits shown in Figure6b10 5 times assuming an ideal, noise-free quantum computer with all-to-all connectivity.For the experimental data, the circuit has been optimized for IBMQ-27 Ehningen.100 experimental hardware runs were performed, each likewise comprising 10 5 circuit executions.The blue stars represent the respective averages.The optimum quantum circuit for |00 on IBMQ-27 Ehningen is shown in FigureS5a.

FIG. 8 .
FIG. 8. (a) Results for the average output of the single-step gearbox from Figure 6b.(b) Inset indicated with the red box in (a).Analytical values are given by Ω(1 | 0) ≈ 0.644 according to Eq. (12), and S •1 (θ θ θ) = 0.567 for Ω(10).Simulated and experimental data for Ω(1 | 0) are based on the circuit from Figure 6b with the state register represented by |Ψ = |+ |+ .Simulated and experimental data for Ω(10) stems from executing the subcircuits ∆ 0 , ∆ 2 , ∆ 4 and ∆ 6 , and post-processing the results as detailed in the main text.For all (sub)circuits, 100 hardware runs (gray circles) with 10 5 executions each were conducted.The respective averages and 3σ levels are indicated by the blue circles and error bars.The optimum quantum circuits on IBMQ-27 Ehningen for Ω(1 | 0) and Ω(10) can be found in Figure S5.

FIG. 9 .
FIG. 9.The upper row shows quantum circuits for different variants of the ∆ λ subcircuit family.(a) Full subcircuit, (b) subcircuit with externalized multiplication by a λ , and (c) as in (b) but with the phase-equivalent version of the Toffoli gate.Generally, following the gearbox procedure, all values are encoded in the amplitude for the |1 state of each qubit, i.e., qubits f λ , c λ and f c λ store the values for the coefficient √ a λ , the cosine term cos(λθ j ), and their product √ a λ cos(λθ j ), respectively.Qubits ∆ λ and t∆ λ represent final circuit results, where the latter indicates that a further post-processing step is required.Coefficients √ a λ are encoded using a R y (a λ ) gate with a λ = arcsin( √ a λ ).Topological maps in the lower row demonstrate the respectively required connectivity on a quantum computer.Lines indicate direct communication between the involved qubits.Subcircuit elements are identified using color-coding.The number of CX gates are summarized in TableI.

4 )
FIG. S3.Optimum quantum circuit for IBMQ-27 Ehningen implementing the ReLU function, corresponding to the circuit shown in Figure5din the main text.Here we used x = π − x with the definition for x as given in the main text.The first column indicates the virtual-to-physical qubit mapping.

TABLE I .
Hardware requirements for the different variants of the ∆ λ subcircuits shown in Figure9.
Let s(x) be a real-valued function with period P, integrable on the interval [0, π/2].Note that rescaling of s(x) may be required to guarantee normalization, i.e., s(x) ∈ [0, 1].Fourier analysis states that s(x) may be approximated for specific a i , b i ∈ R by