Direct implementation of a perceptron in superconducting circuit quantum hardware

The utility of classical neural networks as universal approximators suggests that their quantum analogues could play an important role in quantum generalizations of machine-learning methods. Inspired by the proposal in [Torrontegui and Garc\'ia-Ripoll 2019 EPL 125 30004], we demonstrate a superconducting qubit implementation of an adiabatic controlled gate, which generalizes the action of a classical perceptron as the basic building block of a quantum neural network. We show full control over the steepness of the perceptron activation function, the input weight and the bias by tuning the adiabatic gate length, the coupling between the qubits and the frequency of the applied drive, respectively. In its general form, the gate realizes a multi-qubit entangling operation in a single step, whose decomposition into single- and two-qubit gates would require a number of gates that is exponential in the number of qubits. Its demonstrated direct implementation as perceptron in quantum hardware may therefore lead to more powerful quantum neural networks when combined with suitable additional standard gates.


I. INTRODUCTION
Artificial neural networks and engineered quantum systems are both quickly developing technologies with farreaching potential applications. The promise that quantum computing can solve certain problems exponentially faster than classical computing technology and the ever growing thirst for computational power of machine learning applications has triggered substantial interest in the development of quantum machine learning methodology [2][3][4][5][6][7][8][9]. Whereas some work explored the acceleration of specific computational tasks in machine learning with quantum encodings [10], a major part of the research considers quantum neural networks as counterparts to artifical neural networks in classical software. Quantum neural networks are largely implemented as variational quantum circuits [11][12][13][14], that are composed of parametrized gates and where finding optimal parameters corresponds to the training of the network. Moreover, it has been shown that quantum speed-up is possible in supervised machine learning [15], providing a perspective for hardware-efficient machine learning realizations.
An important question for quantum neural networks is their expressivity, i.e. which mappings between input and output states can be realized by the network. In the classical domain, expressivity is guaranteed by the universal approximation theorem, which requires a nonlinear activation function for the perceptrons in the network. In the quantum setting, it has been shown that similar notions of universal approximation exist for functions mapping gate parameters to the state prepared by the quantum circuit [16]. However, currently there is no universally accepted way to extend the concept of classical neural networks to quantum systems [17,18]. One option is to design a quantum perceptron as a unitary whose action on a set of basis states matches a classical perceptron [1]. Such a building block can serve as a tool to enable experimental studies and development of quantum neural networks. Since such an approach includes the functionality of classical neural networks in limiting cases, it has the expressive power guaranteed by the universal approximation theorem in those cases and thus forms a natural starting point for exploring quantum generalizations. Another advantage of this approach is that the perceptron may be directly realized at the hardware level, rather than encoded in software: this neuromorphic approach is more hardware efficient and may thus offer significant advantages for scaling the concept to larger and more data intensive applications.
The construction of the quantum counterpart to a perceptron is easily illustrated with the example of a binary classifier perceptron. This simple classical perceptron takes a number of binary inputs x i ∈ {0, 1}, each with an associated weight w i , and outputs another binary value y depending on whether the linear combination x in = w 1 x 1 + . . . exceeds a given threshold or not: or, written compactly in terms of the Heaviside step function, y = Θ(x in + b) with an additional bias b. More generally, as illustrated in Fig. 1(a), the step function may be replaced by a continuous activation function f and the inputs and outputs promoted from binary values to continuous ones. The key advantage of such continu- Schematic representation of a classical perceptron with n = 5 inputs. The perceptron output y is given by a non-linear activation function f (·) applied to the sum of input values x1, ..., xn individually weighted by w1, ..., wn and biased by a term b. A quantum analog (b) can be constructed, where the inputs are encoded in quantum states |x1 , ..., |xn of n qubits and an output qubit undergoes a rotation by an angle of θ = arcsin[c(x)] determined by the inputs values, leading to an excited population of |c(x)| 2 when starting the output qubit in the ground state.
ous activation functions is that they have finite gradients which can be used for training the network in gradient descent approaches.
In a quantum generalization of the classifier perceptron, the binary variables can be naturally represented by qubit states, which we will label as |0 and |1 . The action of the perceptron can then be represented by a gate U whose effect on the output qubit depends on the states of the input qubits: When the input is a product state |x 1 x 2 . . . in , then the unitary is simply a rotation of the output qubit by 0 or π depending on the sign of x in + b [19]. This concept of a quantum perceptron can be further generalized, similarly to the classical case, by letting the rotation angle have a different dependence on x than a simple Heaviside function. The action of the perceptron, shown schematically in Fig. 1(b), is then described as Here the excitation amplitude c(x) is a continuous, steplike function of x ≡ x in + b such as depicted in Fig. 1. The previously described binary perceptron is the special case where c(x) is the Heaviside step function Θ(x).
Since the perceptron gate is a unitary operation, it can in principle be implemented via any universal set of quantum gates, such as single-qubit rotations and controlled-NOT (CNOT) gates. However, the depth of such a decomposition in general grows exponentially in the number of input qubits. Therefore, we instead realize the perceptron gate directly, making use of an adiabatic protocol. This approach allows us to implement the gate with a single adiabatic pulse whose duration does not scale with the number of input qubits: From Eq. (1) it can be seen that the perceptron acts like a multi-qubit controlled gate. However, instead of triggering an operation on the target only when the control register is in the |1 . . . 1 state (like a Toffoli gate [20], which applies an X gate to the target only when the control qubits are in the state |11 ), the perceptron gate applies a different operation to the output qubit for each possible input basis state |x 1 x 2 . . . in . As the number of input basis states grows exponentially in the number of qubits, a standard decomposition into elementary gates leads to an expontial growth of the circuit depth (see Section III), which is in contrast to the adiabatic implementation discussed in the following.

A. Concept
We construct the perceptron unitary with an approach that is a slight modification of a theoretical proposal by Torrontegui and García-Ripoll [1], as described below. An implementation of a similar operation was also proposed [19] based on repeated measurements and feedback, while a gate-based approach to construction of activation functions was demonstrated in [21].
Our protocol, which we implement in a device with two fixed-frequency superconducting transmon qubits [22] interacting via a tunable coupler [23] (schematically depicted in Fig. 2(a)) is based on an adiabatic unitary evolution applied to the output qubit. This gate U ad is designed in such a way that the final excited state amplitude depends on the detuning ∆ of a microwave drive frequency from the output qubit frequency via the steplike response function c(∆): Note that this becomes exactly the mapping in Eq. (1) if the detuning ∆ is equal to the linear combination of inputs j w j x j + b. We achieve this linear dependence of ∆ on the input states by subjecting the system to a ZZ interaction between each of the input qubits and the perceptron qubit: Here j enumerates the input qubits and "out" labels the output qubit. If the input qubits are in a product state |x 1 . . . , this is equivalent to shifting the frequency of the output by − j w j x j , or −w 1 x 1 in the case of a single input.

B. Implementation
The ZZ interactions need to be configurable in-situ to allow training of the network. This can be achieved by using tunable couplers [23] mediating interactions between the output qubit and each input qubit. When the couplers are sufficiently far detuned from the qubits, the dispersive approximation is valid and the interactions effectively introduce ZZ coupling terms, giving rise to the Hamiltonian in Eq. (3). The individual interaction strengths representing the network weights can be tuned by changing the frequencies of the couplers [24]. Using this scheme we can control the weight of each input by tuning of the respective coupler frequency and the bias point by changing the frequency of the microwave drive.
In the two-qubit device used for our experimental demonstration, qubit 1 at a frequency ω 1 /2π = 6.189 GHz serves as the output while qubit 2 at a frequency ω 2 /2π = 5.089 GHz is the single-valued input register. The qubits have anharmonicities α 1 /2π = −286 MHz and α 2 /2π = −310 MHz. The coupler has a frequency tunable from its maximum of approximately ω c,max /2π = 7.8 GHz to well below the frequencies of both qubits. The qubit-coupler interaction strengths are approximately g 1 /2π = 142 MHz and g 2 /2π = 116 MHz.
Using Ramsey-type measurements with the input qubit either in |0 or in |1 , we characterize the dependence of the ZZ coupling on the frequency of the coupler and observe that it can be tuned over a range of a few MHz (see Fig. 2  The excited state population pe of the output qubit as a function of its final detuning ∆ after an adiabatic chirped pulse is applied on the ground state of two-qubit system. Results for different pulse lengths T are offset for visualization. The characteristic width of the activation function is inversely proportional to the pulse lengths T . The solid lines result from a simulation of unitary evolution under the adiabatic drive, with no fit parameters in this model. (b) Frequency shift of the activation function dependent on the state of the input qubit, for a pulse length of T = 1.67 µs Both the magnitude and sign of this shift can be varied by tuning the frequency of the coupler. In these plots, the output qubit excitations are rescaled to correct for relaxation and readout imperfections, in order to highlight the agreement of the curve's shape with the simulation results. The frequency spacing between the simulated curves is not obtained by fitting but rather predicted from measured J values (see Fig. 2(b)). The vertical lines indicate values of b/2π = 0.8 MHz (dashed) and b/2π = 4.0 MHz (dotted) which are used in Fig. 5(a). from 4 th order perturbation theory (see Eq. (B2)), and similarly by the numerical diagonalization of the Hamiltonian (see Eq. (B1)).
The single-qubit gate, expressed by Eq. (2), which underlies the quantum perceptron protocol is realized by applying an adiabatic chirped pulse to qubit 1 [25,26]. The pulse's initial frequency ω i is far below the output qubit's frequency ω 1 (where ω 1 is defined in the absence of the interaction H int ) while its final frequency ω f is detuned from it by b ≡ ω f − ω 1 . This means the initial detuning of the drive from the qubit is negative while the final detuning is ∆ = j w j x j + b. Note that the bias b, being equal to the final chirp detuning from the unshifted qubit frequency, can be tuned arbitrarily by changing ω f .
Under perfect adiabatic conditions, as illustrated in Fig. 3(a,b), if the initial and the final detuning have the same sign, the two basis states |0 and |1 remain unchanged by the pulse (up to accumulated phases). If they have opposite signs, the states are flipped. The resulting dependence of the final excited state population on the qubit frequency is illustrated in Fig. 3(c). Since we choose the initial frequency of the chirped pulse ω i significantly lower than the lowest possible frequency of the output qubit (ω 1 − j,wj >0 w j ), we neglect the rising edge of the function. The adiabatic operation is then exactly described by Eq. (1) with |c(∆)| 2 = Θ(∆) and ∆ = j w j x j + b. Smooth response functions |c(∆)| 2 arise naturally from imperfect adiabaticity of the chirp pulse: With a pulse of finite duration T , the process becomes non-adiabatic when the detuning ∆ is on the order of 1/T or smaller. This leads to a smoothening of the step in the response function and a finite width of roughly 1/T . In our experiment, the chirped pulse has a timedependent frequency ω p (t) = ω i + (ω f − ω i ) sin 2 (πt/2T ) and amplitude Ω(t) = Ω 0 sin(πt/T ), (Fig. 3d). As a time-transformed version of a hyperbolic secant pulse [27,28], for which analytical solutions of the population transfer exist, it leads to hyperbolic shaped activation functions with few fitting parameters (see Appendix C). Fig. 4(a) shows the measured effect of the pulse on the qubit state as a function of the bias b = ω f − ω 1 for several durations, T , of the pulse. Since the control qubit is left in its ground state, i.e. x 1 = 0, the curves are independent of the weight w 1 . In these measurements, performed with a pulse amplitude Ω 0 /2π = 19.7 MHz and initial frequency ω i = ω f − 80 MHz, the qubit is initialized in its ground state and its excited state population p e after the pulse is observed to follow a step-like curve, where the slope of the transition increases with the pulse length. The broadening of the response curve due to non-adiabatic effects is well reproduced by unitary simulations, where we simply evolve the driven twolevel-system Hamiltonian in the rotating frame, taking the rotating wave approximation. We take T = 1.67µs to be the default gate time for the following results.
The effect of the ZZ coupling between the output and the input qubit is a shift of the S-shaped activation func-tion dependent on the input qubit being in the excited state, as demonstrated in Fig. 4(b). The curves corresponding to the input qubit being in the ground state (blue) are unshifted, while the ones for an excited input qubit (orange) show a shift which is adjustable by changing the coupler frequency. This shift equals the ZZ coupling strength J = −w 1 (where the sign flip is included in the definition of the weight such that the activation function is increasing, as per convention) in the single connection between the output and the input qubit in the quantum perceptron gate. Depending on the relative frequency of the coupler with respect to the qubit frequencies, the weight can be made positive or negative (see Fig. 2).
As the next step towards trainability of the network, we control the weight w 1 at a fixed bias (see Fig. 5(a)). For the input qubit in the ground state, the final state of the output does not depend on w 1 , since x 1 = 0. For the control in the excited state, the dependence on w 1 is simply the activation function itself, shifted by choosing different values for the bias b.
We further characterize the perceptron gate with quantum process tomography [29] and verify several important aspect of the implemented process with the extracted process matrices: It is expected to act as a controlled gate -that is, if the input qubit is prepared in one of the basis states |0 , |1 , the gate should leave it in this state, independently of the output qubit's initial state. We calculate the average fidelity with which the process M satisfies this condition, i.e.
where · ϕ denotes averaging over the initial state |ϕ of the input qubit. This average is calculated from the process matrix of M by directly using Eq. (4) and the identity |ϕ ϕ| ϕ = 1/2. The obtained fidelity values are independent of the detuning ∆ and lie between 0.95 and 0.97, indicating that the implemented operation is to a good approximation a controlled gate. We also evaluate the purity of the final state averaged over the initially prepared state and obtain a value around 0.78 (for a 1.7 µs long pulse), independent of the qubit-drive detuning. This confirms that the gate is mostly a unitary process, limited slightly by qubit decoherence. The average purity value is consistent with typical observed qubit dephasing times which vary in our experiment between 10 µs and 20 µs. Finally, to show the entangling property of the perceptron gate, we prepare the input qubit in an equal superposition state and the output qubit in its ground state. We then apply the perceptron gate, extract the density matrix of the final two-qubit state and calculate its negativity [30], which reaches nearly the maximum value of 1/2 for bias values at which the activation functions for the two computational states of the input qubit are well separated (Fig. 5(b)). In the limit of large weight, the ideal operation at the mid-point between the activation function slopes would become equivalent (up to local phases) to a controlled NOT gate, preparing a Bell state and reaching maximum negativity.

III. EQUIVALENT CIRCUIT AND SCALING COMPLEXITY
For a general, multi-qubit input register the perceptron's action in Eq. (1) can be considered as a generalized multi-qubit conditional gate: A separate operation V (x) is applied for each of the basis states |x ≡ |x 1 x 2 . . . x N of the N input qubits. This is in contrast to a single multi-qubit controlled operation, that applies the operation V to a output qubit only if all input qubits are in the state |1 . Thus, to implement the perceptron = (a) The controlled-W gate is further decomposed into at most two CNOT gates and three single qubit gates A, B and C.
gate requires a multi-qubit controlled gate for each input basis state, in total 2 N operations. This can be reduced to 2 N − 1 by implementing the V (x) of one of the input strings as a single qubit gate and adjusting all multi-qubit controlled operations accordingly. For small N 10, a multi-qubit controlled gate can be decomposed into 2 N +1 − 2 two-qubit gates [31], which gives us a total gate count for the equivalent circuit of N g = (2 N − 1)(2 N +1 − 2), approaching ∼ 2 2N +1 for large number of input qubits.
More specifically, the single-input perceptron implemented in this paper can be decomposed into single qubit gates and two CNOT gates (Fig. 6).The fidelity and total duration of a sequence is typically limited by the number of two-qubit gates in transmon-based architectures. Hence, when ignoring single qubit gates, the sequence equivalent to the perceptron gate would be implemented in a gate time of ∼ 120 ns and with an estimated fidelity of ∼ 99.4%. Here, we have assumed state of the art CNOT/CZ gates with fidelities of up to 99.7% [32,33] and gate times of 60ns. The advantage of the adiabatic protocol, with approximately constant time, over the equivalent unitary circuit becomes apparent as the number of input qubits is increased, as discussed in Appendix A. For example, when decomposing the perceptron gate with two input qubits the best scenario estimate suggests a fidelity of ∼ 94.7% with a gate time of ∼ 1.1 µs.

IV. DISCUSSION AND OUTLOOK
By realizing the adiabatic perceptron dynamics that activates the output qubit depending on the state of connected input qubits directly in hardware, we have demonstrated the basic building block of a quantum feedforward neural network. In this co-design approach, we show that by changing the length of the perceptron adiabatic qubit pulse the shape of the activation function can be modified, whereas the ZZ shift mediated by a tunable coupler and the adiabatic drive final frequency can be tuned to modify the weights and bias, which allows for training of the neural network.
As an extension of the demonstrated single-digit input scenario, an efficient implementation of a multi-input quantum perceptron can be realized by coupling multiple input qubits, each with its own tunable coupler, to a single output qubit. Its cumulative frequency shift would be of the form shown in Eq. (3) [34]. It is noteworthy that the adiabatic gate presented here leads to complex multi-qubit operations with two-body interactions only and with a gate time that is independent of the number of inputs. In contrast, the time to run the equivalent circuit scales exponentially in the number of qubits, with significant advantage of the co-designed perceptron expected already for the three-qubit input case. While the quantum perceptron we co-design exponentially reduces the number of required conventional gates, all-toall connectivity between the qubits of adjacent layers is desired to fully benefit from this advantage. Limitations imposed by the number of couplers that can be physically attached to a single qubit or compromises in the connectivity between layers may reduce this advantage. Therefore, multi-qubit couplers [35][36][37][38] may become increasingly useful, as will be architectures that combine the presented adiabatic implementation of a perceptron with standard digital gate sequences.
To ensure proper trainability of the network, the range of achievable weigths and biases should allow to probe the activation function at both limits f (x) ≈ 1 and f (x) ≈ 0. For our implementation, the possible range of ZZ shifts should be comparable to the charateristic width of the adiabatic curve, that is in turn limited by the ability to follow the eigenstates adiabatically given by the speed of the gate. Therefore, to reduce the perceptron gate time one could envisage either non-adiabatic versions of the perceptron [39], or an extended range of achievable ZZ coupling by AC [40,41] or DC [42] pulses on the tunable coupler.
While we have demonstrated the entangling power of the perceptron gate, evaluating quantum advantage in larger networks is subject to forthcoming investigations. For instance, it will be important to understand how decoherence will affect the result and what role the entanglement between the input and output layer plays, in particular if the input layer is in a superposition state. Partially tracing out input and hidden layers and recycling qubits might allow for larger networks and protect the results from decoherence of earlier layers. Moreover, an investigation of the effects of higher connectivity on the expressivity of the nextwork will be required.
Finally, neural networks based on adiabatic perceptron gates may also have applications not directly related to machine learning. As an example, it is plausible that their quantum counterparts could serve well to parametrize unitaries in variational quantum algorithms [11,12,14], similar to the efficient approximation capability found in classical neural networks. The equivalent unitary matrix for the adiabatic perceptron can be expressed in terms of standard gates such a two-qubit controlled NOT and single qubit gates. However, the depth of the equivalent circuit grows exponentially with the number of input qubits.
The action of the perceptron in Eq. (1) can be thought of as a generalized multi-conditional gate. Whereas a 'conventional' N-controlled-V operation applies the gate V to a output qubit only if the control qubit (or qubits, for a multi-qubit controlled gate) is in the |1 state, in the perceptron gate for every computational basis state |x ≡ |x 1 x 2 . . . x N with x i ∈ {0, 1} of the control qubits a different gate V (x) is applied to the output qubit. The matrix representation for such a controlled-gate has a block-diagonal structure with each block V (x) corresponding to a particular input string x. Thus, the perceptron gate can be expressed in terms of multi-qubit controlled unitaries as illustrated for a single input qubit in Fig. 6(a) and for two input qubits in Fig. 7(a). However, the number of multi-qubit controlled unitaries can be decreased by one by applying one the V (x) gates as a single qubit gates, e.g. the When acting on an output qubit initialized in the |0 state, the perceptron gate corresponds to the map By imposing unitarity, we can infer the action of the perceptron gate on a output qubit in the |1 state, and the controlled-V (x) gate with For real c(x), we can write c(x) = sin θ x /2 and 1 − c 2 (x) = cos θ x /2 and express V (x) as That is to say, the perceptron reduces to a sequence of rotations around the X, Y and Z axes. The number of multi-qubit controlled rotations in this decomposition is equal to the number of possible input bitstrings, which scales exponentially with the number of input qubits. As discussed in the main text, this may lead to a potential advantage of this protocol over more standard paramaterized gates.
Moreover, near-term quantum computers cannot generally implement multi-qubit controlled gates at once, but rather must decompose them into a series of oneand two-qubit gates as shown in Figs. 6(c) and 7(c) by using CNOT gates as the primitive two-qubit operation.
In the main text, we use gate times and fidelities quoted for CZ instead of CNOT gates, being F = 99.7% and τ = 60 ns as these are currently better for state-ofthe-art transmon qubit architectures and since a CZ can easily be transformed to a CNOT via single qubit gates.
With these decompositions, and assuming that the two-qubit gates are the dominant source of error, we obtain estimates for the equivalent fidelities and gate times quoted in the main text: for the two-qubit (single-input, single-output) gate, 2 CNOTs are needed, leading to a fidelity of ∼ 98% and a gate time of ∼ 240 ns; for the three-qubit (two inputs, one output) gate 18 CNOTs are needed, leading to a fidelity of ∼ 91% and a gate time of ∼ 1.1 µs. Gate time and fidelity estimates for more inputs are summarised in Table I The Hamiltonian describing the system is where a † i (a i ) are creation (annihilation) operators, ω i are the frequencies and α i are the anharmonicities of qubits (i = 1, 2) and coupler (i = c), and g ic are their respective couplings. The anharmonicities α i are defined as the difference ω 12 − ω 01 between the 0 ↔ 1 and 1 ↔ 2 transition frequencies; values for the measured parameters can be found in [41].
Using fourth-order perturbation theory, we can approximate the ZZ coupling strength J as where ∆ 1,2 = ω c − ω 1,2 are the detunings of the coupler from the two qubits.