Continuous-variable quantum neural networks

We introduce a general method for building neural networks on quantum computers. The quantum neural network is a variational quantum circuit built in the continuous-variable (CV) architecture, which encodes quantum information in continuous degrees of freedom such as the amplitudes of the electromagnetic field. This circuit contains a layered structure of continuously parameterized gates which is universal for CV quantum computation. Affine transformations and nonlinear activation functions, two key elements in neural networks, are enacted in the quantum network using Gaussian and non-Gaussian gates, respectively. The non-Gaussian gates provide both the nonlinearity and the universality of the model. Due to the structure of the CV model, the CV quantum neural network can encode highly nonlinear transformations while remaining completely unitary. We show how a classical network can be embedded into the quantum formalism and propose quantum versions of various specialized model such as convolutional, recurrent, and residual networks. Finally, we present numerous modeling experiments built with the Strawberry Fields software library. These experiments, including a classifier for fraud detection, a network which generates Tetris images, and a hybrid classical-quantum autoencoder, demonstrate the capability and adaptability of CV quantum neural networks.

We introduce a general method for building neural networks on quantum computers. The quantum neural network is a variational quantum circuit built in the continuous-variable (CV) architecture, which encodes quantum information in continuous degrees of freedom such as the amplitudes of the electromagnetic field. This circuit contains a layered structure of continuously parameterized gates which is universal for CV quantum computation. Affine transformations and nonlinear activation functions, two key elements in neural networks, are enacted in the quantum network using Gaussian and non-Gaussian gates, respectively. The non-Gaussian gates provide both the nonlinearity and the universality of the model. Due to the structure of the CV model, the CV quantum neural network can encode highly nonlinear transformations while remaining completely unitary. We show how a classical network can be embedded into the quantum formalism and propose quantum versions of various specialized model such as convolutional, recurrent, and residual networks. Finally, we present numerous modeling experiments built with the Strawberry Fields software library. These experiments, including a classifier for fraud detection, a network which generates Tetris images, and a hybrid classical-quantum autoencoder, demonstrate the capability and adaptability of CV quantum neural networks.

I. INTRODUCTION
After many years of scientific development, quantum computers are now beginning to move out of the lab and into the mainstream. Over those years of research, many powerful algorithms and applications for quantum hardware have been established. In particular, the potential for quantum computers to enhance machine learning is truly exciting [1][2][3]. Sufficiently powerful quantum computers can in principle provide computational speedups for key machine learning algorithms and subroutines such as data fitting [4], principal component analysis [5], Bayesian inference [6,7], Monte Carlo methods [8], support vector machines [9,10], Boltzmann machines [11,12], and recommendation systems [13].
On the classical computing side, there has recently been a renaissance in machine learning techniques based on neural networks, forming the new field of deep learning [14][15][16]. This breakthrough is being fueled by a number of technical factors, including new software libraries [17][18][19][20][21] and powerful special-purpose computational hardware [22,23]. Rather than the conventional bit registers found in digital computing, the fundamental computational units in deep learning are continuous vectors and tensors which are transformed in highdimensional spaces. At the moment, these continuous computations are still approximated using conventional digital computers. However, new specialized computational hardware is currently being engineered which is fundamentally analog in nature [24][25][26][27][28][29][30][31].
Quantum computation is a paradigm that furthermore includes nonclassical effects such as superposition, interference, and entanglement, giving it potential advantages over classical computing models. Together, these ingredients make quantum computers an intriguing platform for exploring new types of neural networks, in particular hybrid classical-quantum schemes [32][33][34][35][36][37][38][39]. Yet the familiar qubit-based quantum computer has the drawback that it is not wholly continuous, since the measurement outputs of qubit-based circuits are generally discrete. Rather, it can be thought of as a type of digital quantum hardware [40], only partially suited to continuous-valued problems [41,42].
The quantum computing architecture which is most naturally continuous is the continuous-variable (CV) model. Intuitively, the CV model leverages the wavelike properties of nature. Quantum information is encoded not in qubits, but in the quantum states of fields, such as the electromagnetic field, making it ideally suited to photonic hardware. The standard observables in the CV picture, e.g., positionx or momentump, have continuous outcomes. Importantly, qubit computations can be embedded into the quantum field picture [43,44], so there is no loss in computational power by taking the CV approach. Recently, the first steps towards using the CV model for machine learning have begun to be explored, showing how several basic machine learning primitives can be built in the CV setting [45,46]. As well, a kernel-based classifier using a CV quantum circuit was trained in [10]. Beyond these early forays, the CV model remains largely unexplored territory as a setting for machine learning.
In this work, we show that the CV model gives a native architecture for building neural network models on quantum computers. We propose a variational quantum circuit which straightforwardly extends the notion of a fully connected layer structure from classical neural networks to the quantum realm. This quantum circuit contains a continuously parameterized set of operations which are universal for CV quantum computation. By stacking multiple building blocks of this type, we can create multilayer quantum networks which are increas-ingly expressive. Since the network is made from a universal set of gates, this architecture can also provide a quantum advantage: for certain problems, a classical neural network would require exponentially many resources to approximate the quantum network. Furthermore, we show how to embed classical neural networks into a CV quantum network by restricting to the special case where the gates and parameters of the network do not create any superposition or entanglement.
This paper is organized as follows. In Sec. II, we review the key concepts from deep learning and from quantum computing which set up the remainder of the paper. We then introduce our basic continuous-variable quantum neural network model in Sec. III and explore it in detail theoretically. In Sec. IV, we validate and showcase the CV quantum neural network architecture through several machine learning modeling experiments. We conclude with some final thoughts in Sec. V.

II. OVERVIEW
In this section, we give a high-level synopsis of both deep learning and the CV model. To make this work more accessible to practitioners from diverse backgrounds, we will defer the more technical points to later sections. Both deep learning and CV quantum computation are rich fields; further details can be found in various review papers and textbooks [14,16,40,[47][48][49].

A. Neural networks and deep learning
The fundamental construct in deep learning is the feedforward neural network (also known as the multilayer perceptron) [16]. Over time, this key element has been augmented with additional structure -such as convolutional feature maps [50], recurrent connections [51], attention mechanisms [52], or external memory [53]for more specialized or advanced use cases. Yet the basic recipe remains largely the same: a multilayer structure, where each layer consists of a linear transformation followed by a nonlinear 'activation' function. Mathematically, for an input vector x ∈ R n , a single layer L performs the transformation where W ∈ R m×n is a matrix, b ∈ R m is a vector, and ϕ is the nonlinear function. The objects W and b -called the weight matrix and the bias vector, respectively -are made up of free parameters θ W and θ b . Typically, the activation function ϕ contains no free parameters and acts element-wise on its inputs. The 'deep' in deep learning comes from stacking multiple layers of this type together, so that the output of one layer is used as an input for the next. In general, each layer L i will have its own independent weight and bias parameters. Summarizing all model parameters by the parameter set θ, an N -layer neural network model is given by and maps an input x to a final output y.
Building machine learning models with multilayer neural networks is well-motivated because of various universality theorems [54][55][56]. These theorems guarantee that, provided enough free parameters, feedforward neural networks can approximate any continuous function on a closed and bounded subset of R n to an arbitrary degree of accuracy. While the original theorems showed that two layers were sufficient for universal function approximation, deeper networks can be more powerful and more efficient than shallower networks with the same number of parameters [57][58][59].
The universality theorems prove the power of the neural network model for approximating functions, but those theorems do not say anything about how to actually find this approximation. Typically, the function to be fitted is not explicitly known, but rather its inputoutput relation is to be inferred from data. How can we adjust the network parameters so that it fits the given data? For this task, the workhorse is the stochastic gradient descent algorithm [60], which fits a neural network model to data by estimating derivatives of the model's parameters -the weights and biases -and using gradient descent to minimize some relevant objective function. Combined with a sufficiently large dataset, neural networks trained via stochastic gradient descent have shown remarkable performance for a variety of tasks across many application areas [14,16].

B. Quantum computing and the CV model
The quantum analogue of the classical bit is the qubit. The quantum states of a many-qubit system are normalized vectors in a complex Hilbert space. Various attempts have been made over the years to encode neural networks and neural-network-like structures into qubit systems, with varying degrees of success [61]. One can roughly distinguish two strategies. There are approaches that encode inputs into the amplitude vector of a multiqubit state and interpret unitary transformations as neural network layers. These models require indirect techniques to introduce the crucial nonlinearity of the activation function, which often lead to a nonnegligible probability for the algorithm to fail [62][63][64]. Other approaches, which encode each input bit into a separate qubit [65,66], have an overhead stemming from the need to binarize the continuous values. Furthermore, the typical neural network structure of matrix multiplication and nonlinear activations becomes cumbersome to translate into a quantum algorithm, and the advantages of doing so are not always apparent. Due to these constraints, qubit architectures are arguably not the most flexible quantum frameworks for encoding neu-ral networks, which have continuous real-valued inputs and outputs.
Fortunately, qubits are not the sole medium available for quantum information processing. An alternate quantum computing architecture, the CV model [67], is a much better fit with the continuous picture of computation underlying neural networks. The CV formalism has a long history, and can be physically realized using optical systems [68,69], in the microwave regime [70][71][72], and using ion traps [73][74][75]. In the CV model, information is carried in the quantum states of bosonic modes, often called qumodes, which form the 'wires' of a quantum circuit. Continuous-variable quantum information can be encoded using two related pictures: the wavefunction representation [76,77] and the phase space formulation of quantum mechanics [78][79][80][81]. In the former, we specify a single continuous variable, say x, and represent the state of the qumode through a complexvalued function of this variable called the wavefunction ψ(x). Concretely, we can interpret x as a position coordinate, and |ψ(x)| 2 as the probability density of a particle being located at x. From elementary quantum theory, we can also use a wavefunction based on a conjugate momentum variable, φ(p). Instead of position and momentum, x and p can equivalently be pictured as the real and imaginary parts of a quantum field, such as light.
In the phase space picture, we treat the conjugate variables x and p on equal footing, giving a connection to classical Hamiltonian mechanics. Thus, the state of a single qumode is encoded with two real-valued variables (x, p) ∈ R 2 . For N -qumodes, the phase space employs 2N real variables (x, p) ∈ R 2N . Qumode states are represented as real-valued functions F (x, p) in phase space called quasiprobability distributions. 'Quasi' refers to the fact that these functions share some, but not all, properties with classical probability distributions. Specifically, quasiprobability functions can be negative. While normalization forces qubit systems to have a unitary geometry, normalization gives a much looser constraint in the CV picture, namely that the function F (x, p) has unit integral over the phase space. Qumode states also have a representation as vectors or density matrices in the countably infinite Hilbert space spanned by the Fock states {|n } ∞ n=0 , which are the eigenstates of the photon number operatorn. These basis states represent the particle-like nature of qumode systems, with n denoting the number of particles. This is analogous to how square-integrable functions can be expanded using a countable basis set like sines or cosines.
The phase space and Hilbert space formulations give equivalent predictions. Thus, CV quantum systems can be explored from both a wave-like and a particle-like perspective. We will mainly concentrate on the former.

Gaussian operations
There is a key distinction in the CV model between the quantum gates which are Gaussian and those which are not. In many ways, the Gaussian gates are the "easy" operations for a CV quantum computer. The simplest single-mode Gaussian gates are rotation R(φ), displacement D(α), and squeezing S(r). The basic two-mode Gaussian gate is the (phaseless) beamsplitter BS(θ), which can be understood as a rotation between two qumodes. More explicitly, these Gaussian gates produce the following transformations on phase space:
Notice that most of these Gaussian operations have names suggestive of a linear character. Indeed, there is a natural correspondence between Gaussian operations and affine transformations on phase space. For a system of N modes, the most general Gaussian transformation has the effect where M is a real-valued symplectic matrix and α ∈ C N ∼ = R 2N is a complex vector with real/imaginary parts α r /α i . This native affine structure will be our key for building quantum neural networks.
A matrix M is symplectic if it satisfies the relation M T ΩM = Ω where is the 2N × 2N symplectic form. A generic symplectic matrix M can be split into a type of singular-value decomposition -known as the Euler or Bloch-Messiah decomposition [48,49] -of the form where Σ = diag(c 1 , . . . , c N ) with c i > 0, and K 1 and K 2 are real-valued matrices which are symplectic and orthogonal. A matrix K with these two properties must have the form with CD T − DC T = 0 (11) We will also need later the fact that if C is an arbitrary orthogonal matrix, then C ⊕ C is both orthogonal and symplectic. Importantly, the intersection of the symplectic and orthogonal groups on 2N dimensions is isomorphic to the unitary group on N dimensions. This isomorphism allows us to perform the transformations K i via the unitary action of passive linear optical interferometers. Every Gaussian transformation on N modes (Eq. (7)) can be decomposed into a CV circuit containing only the basic gates mentioned above. Looking back to Eqs. (3)-(6), we can recognize that interferometers made up of R and BS gates are sufficient to generate the orthogonal transformations K 1 , K 2 , while S gates are sufficient to give the scaling transformation Σ ⊕ Σ −1 . Finally, displacement gates complete the full affine transformation. Alternatively, we could have defined the Gaussian transformations as those quantum circuits which contain only the gates given above. The Gaussian transformations are so-named because they map the set of Gaussian distributions in phase space to itself.
Universality in the CV model Similar to neural networks, quantum computing comes with its own inherent notions of 'universality.' To define universality in the CV model, we need to first introduce operator versions of the phase space variables, namelyx andp. Thex operator has a spectrum consisting of the entire real line: where the vectors |x are orthogonal, x|x = δ(x − x ). This operator is not trace-class, and the vectors |x are not normalizable. In the phase space representation, the eigenstates |x correspond to ellipses centered at x = x which are infinitely squeezed, i.e., infinitesimal along the x-axis and correspondingly infinite in extent on the p-axis. The conjugate operatorp has a similar structure:p where p|p = δ(p−p ) and p|x ∼ e −ipx . Each qumode of a CV quantum computer is associated with a pair of operators (x i ,p i ). For multiple modes, we combine the associated operators together into vectors (x,p). These operators have the commutator [x j ,p k ] = iΩ jk , which leads to the famous uncertainty relation for simultaneous measurements ofx andp. Connecting to Eq.
(3), we can associatep with a rotation of the operator x; more concretely,p is the Fourier transform ofx. Indeed, we can transform betweenx andp with the special rotation gate F := R( π 2 ). Using a functional representation, thex operator has the effect of multiplication xψ(x) = xψ(x). In this same representation,p is proportional to the derivative operator,pψ(x) = −i ∂ ∂x ψ(x), as expected from the theory of Fourier transforms.
Universality of the CV model is defined as the ability to approximate arbitrary transformations of the form where the generator H = H(x,p) is a polynomial function of (x,p) with arbitrary but fixed degree [67]. Crucially, such transformations are unitary in the Hilbert space picture, but can have complex nonlinear effects in the phase space picture, a fact that we later make use of for designing quantum neural networks. A set of gates is universal if it can be used to build any U H through a polynomial-depth quantum circuit. In fact, a universal gate set for CV quantum computing consists of the following ingredients: all the Gaussian transformations from Eq. (3)-(6), combined with any single non-Gaussian transformation, which corresponds to a nonlinear function on the phase space variables (x, p). This is analogous to classical neural networks, where affine transformations combined with a single class of nonlinearity are sufficient to universally approximate functions. Commonly encountered non-Gaussian gates are the cubic phase gate V (γ) = exp(i γ 3x 3 ) and the Kerr gate K(κ) = exp(iκn 2 ).

III. CONTINUOUS-VARIABLE QUANTUM NEURAL NETWORKS
In this section, we present a scheme for quantum neural networks using the CV framework. It is inspired from two sides. First, from the structure of classical neural networks, which are universal function approximators and have demonstrated impressive performance on many practical problems. Second, from variational The circuit structure for a single layer of a CV quantum neural network: an interferometer, local squeeze gates, a second interferometer, local displacements, and finally local non-Gaussian gates. The first four components carry out an affine transformation, followed by a final nonlinear transformation.
quantum circuits, which have recently become the predominant way of thinking about algorithms on nearterm quantum devices [10,34,35,37,[82][83][84][85][86]. The main idea is the following: the fully connected neural network architecture provides a powerful and intuitive ansatz for designing variational circuits in the CV model. We will first introduce the most general form of the quantum neural network, which is the analogue of a classical fully connected network. We then show how a classical neural network can be embedded into the quantum formalism as a special case (where no superposition or entanglement is created), and discuss the universality and computational complexity of the fully quantum network. As modern deep learning has moved beyond the basic feedforward architecture, considering ever more specialized models, we will also discuss how to extend or specialize the quantum neural network to various other cases, specifically recurrent, convolutional, and residual networks. In Table I, we give a high-level matching between neural network concepts and their CV analogues.

A. Fully connected quantum layers
A general CV quantum neural network is built up as a sequence of layers, with each layer containing every gate from the universal gate set. Specifically, a layer L consists of the successive gate sequence shown in Fig. 1: where U i = U i (θ, φ) are general N -port linear optical interferometers containing beamsplitter and rotation gates, are collective displacement and squeezing operators (acting independently on each mode) and Φ = Φ(λ) is some non-Gaussian gate, e.g., a cubic phase or Kerr gate. The collective gate variables (θ, φ, r, α, λ) form the free parameters of the network, where λ can be optionally kept fixed. FIG.
2. An example multilayer continuous-variable quantum neural network. In this example, the later layers are progressively decreased in size. Qumodes can be removed either by explicitly measuring them or by tracing them out. The network input can be classical, e.g., by displacing each qumode according to data, or quantum. The network output is retrieved via measurements on the final qumode(s).
The sequence of Gaussian transformations D • U 2 • S • U 1 is sufficient to parameterize every possible unitary affine transformation on N qumodes. In the phase space picture, this corresponds to the transformation of Eq. (7). This sequence thus has the role of a 'fully connected' matrix transformation. Interestingly, adding a nonlinearity uses the same component that adds universality: a non-Gaussian gate Φ. Using z = (x, p), we can write the combined transformation in a form reminiscent of Eq. (1), namely Thanks to the CV encoding, we get a nonlinear functional transformation while still keeping the quantum circuit unitary.
Similar to the classical setup, we can stack multiple layers of this type end-to-end to form a deeper network (Fig. 2). The quantum state output from one layer is used as the input for the next. Different layers can be made to have different widths by adding or removing qumodes between layers. Removal can be accomplished by measuring or tracing out the extra qumodes. In fact, conditioning on measurements of the removed qumodes is another method for performing non-Gaussian transformations [68]. This architecture can also accept classical inputs. We can do this by fixing some of the gate arguments to be set by classical data rather than free parameters, for example by applying a displacement D(x) to the vacuum state to prepare the state D(x)|0 . This scheme can be thought of as an embedding of classical data into a quantum feature space [10]. The output of the network can be obtained by performing measurements and/or computing expectation values. The choice of measurement operators is flexible; different choices (homodyne, heterodyne, photon-counting, etc.) may be better suited for different situations.

B. Embedding classical neural networks
The above scheme for a CV quantum neural network is quite flexible and general. In fact, it includes classical neural networks as a special case, where we don't create any superposition or entanglement. We now present a mathematical recipe for embedding a classical neural network into the quantum CV formalism. We give the recipe for a single feedforward layer; multilayer networks follow straightforwardly. Throughout this part, we will represent N -dimensional real-valued vectors x using Nmode quantum optical states built from the eigenstates |x i of the operatorsx i : For the first layer in a network, we create the input x by applying the displacement operator D(x) to the state |x = 0 . Subsequent layers will use the output of the previous layer as input. To read out the output from the final layer, we can use ideal homodyne detection in each qumode, which projects onto the states |x i [49]. We would like to enact a fully connected layer (Eq. (1)) completely within this encoding, i.e., This transformation will take place entirely within the x coordinates; we will not use the momentum variables.
We thus want to restrict our quantum network to never mix betweenx andp. To proceed, we will break the overall computation into separate pieces. Specifically, we split up the weight matrix using a singular value decomposition, W = O 2 ΣO 1 , where the O k are orthogonal matrices and Σ is a positive diagonal matrix. For simplicity, we assume that W is full rank. Rank-deficient matrices form a measure-zero subset in the space of weight matrices, which we can approximate arbitrarily closely with full-rank matrices.
Multiplication by an orthogonal matrix. The first step in Eq. (16) is to apply an interferometer U 1 , which corresponds to the rightmost orthogonal matrix K 1 in Eq. (9). In order not to mixx andp, we must restrict to block-diagonal K 1 . With respect to Eqs. (10)- (12), this means that C is an orthogonal matrix and D = 0. This choice corresponds to an interferometer which only contains phaseless beamsplitters. With this restriction, we have The full derivation of this expression can be found in Appendix A. Thus, the phaseless linear interferometer U 1 is equivalent to multiplying the encoded data by an orthogonal matrix C. To connect to the weight matrix W = O 1 ΣO 2 , we choose the interferometer which has C = O 1 . A similar result holds for the other interferometer U 2 .
Multiplication by a diagonal matrix. For our next element, consider the squeezing gate. The effect of squeezing on thex i eigenstates is [87] where c i = e −ri . An arbitrary positive scaling c i can thus be achieved by taking r i = log(c i ). Note that squeezing leads to compression (positive r i , c i ≤ 1), while antisqueezing gives expansion (negative r i , c i ≥ 1), matching with Eq. (5). A collection of local squeezing transformations thus corresponds to an elementwise scaling of the encoded vector, where Σ := diag({c i }) > 0. We note that since the |x i eigenstates are not normalizable, the prefactor has limited formal consequence. Addition of bias. Finally, it is well-known that the displacement operator acting locally on quadrature eigenstates has the effect for α i ∈ R, which collectively gives Thus, to achieve a bias translation of d, we can simply displace by α = d. Affine transformation. Putting these ingredients together, we have where we have omitted the parameters for clarity. Hence, using only Gaussian operations which do not mix x and p, we can effectively perform arbitrary full-rank affine transformations amongst the vectors |x . Nonlinear function. To complete the picture, we need to find a non-Gaussian transformation Φ which has the following effect where ϕ : R → R is some nonlinear function. We will restrict to an element-wise function, i.e., Φ acts locally on each mode, similar to the activation function of a classical neural network. For simplicity, we will consider ϕ to be a polynomial of fixed degree. By allowing the degree of ϕ to be arbitrarily high, we can approximate any function which has convergent Taylor series. The most general form of a quantum channel consists of appending an ancilla system, performing a unitary transformation on the combined system, and tracing out the ancilla.
For qumode i, we will append an ancilla i in the x = 0 eigenstate, i.e., where, for clarity, we have made the temporary notational change |x i ↔ |x i . Consider now the unitary V ϕ := exp (iϕ(x i ) ⊗p i ), where ϕ(x i ) is understood as a Taylor series using powers ofx i . Applying this to the above two-mode system, we get where we have recognized thatp is the generator of displacements in x. We can now swap modes i and i (using a perfectly reflective beamsplitter) and trace out the ancilla. The combined action of these operations leads to the overall transformation Alternatively, we are free to keep the system in the form |x i |ϕ(x i ) ; this can be useful for creating residual quantum neural networks. Together, the above sequence of Gaussian operations, followed by a non-Gaussian operation, lead to the desired transformation |x → |ϕ(W x + b) , which is the same as a single-layer classical neural network. We remark finally that the states |x were used in order to provide a convenient mathematical embedding; in a practical CV device, we would need to approximate the states |x via finitely squeezed states. In practice, the general quantum neural network framework does not require any particular choice of basis or encoding. Because of this additional flexibility, the full quantum network has larger representational capacity than a conventional neural network and cannot be efficiently simulated by classical models, as we now discuss.

C. The power of CV neural networks
None of the transformations considered in the previous section ever generate superpositions or entanglement. A distinguishing feature of quantum physics is that we can act not only on some fixed basis states, e.g., the states |x , but also on superpositions -that is, linear combinations -of those basis states, |ψ = ψ(x)|x dx, where ψ(x) is a multimode wavefunction. The general CV neural network provides greater freedom in the allowed operations by leveraging the power of universal quantum computation. Indeed, the quantum gates in a single layer form a universal gate set, which implies that a CV quantum neural network shares all the capabilities of a universal CV quantum computer.
To see this, consider an arbitrary quantum computation and its decomposition in terms of a circuit consisting of a sequence of gates from universal gate set. We assign a quantum neural network to this circuit by replacing each gate in the circuit by a single layer. Since each layer contains all gates from the universal set, it can reproduce the action of the single selected gate by setting the parameters of all other gates to zero. Therefore the full network can also replicate the complete quantum circuit.
Since CV quantum neural networks are capable of universal CV quantum computation, in general we do not expect that they can be efficiently simulated on a classical computer. This statement can be put on firmer ground by considering a simple modification to the classical neural network embedding from Sec. III B. Specifically, we carry out a Fourier transform on all modes at the beginning and end of the network. The result is that input states |x are replaced by momentum eigenstates |p and the position homodyne measurements are replaced with momentum homodyne measurements. A momentum eigenstate is an equal superposition over all position eigenstates and thus this circuit can be interpreted as acting on an equal superposition of all classical inputs.
The resulting circuits, consisting of input momentum eigenstates, a unitary transformation that is diagonal in the position basis, and momentum homodyne measurements, are known as continuous-variable instantaneous quantum polynomial (CV-IQP) circuits. It was proven in Ref. [88] that efficient exact classical simulation of CV-IQP circuits would imply a collapse of the polynomial hierarchy to third level. This result was extended in Ref. [89] to the case of approximate classical simulation, under the validity of a plausible conjecture concerning the computational complexity of evaluating high-dimensional integrals. Thus, even a simple modification of the classical embedding presented above gives quantum neural networks the ability to perform tasks that would require exponentially many resources to replicate on classical devices.

D. Beyond the fully connected architecture
Modern deep learning techniques have expanded beyond the basic fully connected architecture. Powerful deep learning software packages [17][18][19][20][21] have allowed researchers to explore more specialized networks or complicated architectures. For the quantum case, we should also not feel restricted to the basic network structure presented above. Indeed, the CV model gives us flexibility to encode problems in a variety of representations. For example, we can use the phase space picture, the wavefunction picture, the Hilbert space picture, or some hybrid of these. We can also encode information in coherent states, squeezed states, Fock states, or superpositions of these states. Furthermore, by choosing the gates and parameters to have particular structure, we can specialize our network ansatz to more closely match a particular class of problems. This can often lead to more efficient use of parameters and better overall mod-FIG. 3. Quantum adaptations of the convolutional layer, recurrent layer, and residual layer. The convolutional layer is enacted using a Gaussian unitary with translationally invariant Hamiltonian, resulting in a corresponding symplectic matrix that has a block Toeplitz structure. The recurrent layer combines an internal signal from previous layers with an external source, while the residual layer combines its input and output signals using a controlled-X gate. els. In the rest of this section, we will highlight potential quantum versions of various special neural network architectures; see Fig. 3 for a visualization.
Convolutional network. A common architecture in classical neural networks is the convolutional network, or convnet [50]. Convnets are particularly well-suited for computer vision and image recognition problems because they reflect a simple yet powerful observation: since the task of detecting an object is largely independent of where the object appears in an image, the network should be equivariant to translations [16]. Consequently, the linear transformation W in a convnet is not fully connected; rather, it is a specialized sparse linear transformation, namely a convolution. In particular, for one-dimensional convolutions, the matrix W has a Toeplitz structure, with entries repeated along each diagonal. This is similar to the well-known principle in physics that symmetries in a physical system can lead to simplifications of our physical model for that system (e.g., Bloch's Theorem [90] or Noether's Theorem [91]).
We can directly enforce translation symmetry on a quantum neural network model by making each layer in the quantum circuit translationally invariant. Concretely, consider the generator H = H(x,p) of a Gaussian unitary, U = exp(−itH). Suppose that this generator is translationally invariant, i.e., H does not change if we map (x i ,p i ) to (x i+1 ,p i+1 ). Then the symplectic matrix M that results from this Gaussian unitary will have the form where each M uv is itself a Toeplitz matrix, i.e., a onedimensional convolution (see Appendix B). The matrix M can be seen as a special kind of convolution that respects the uncertainty principle: performing a convolution on the x coordinates naturally leads to a conjugate convolution involving p. The connection between translationally invariant Hamiltonians and convolutional networks was also noted in [59]. Recurrent network. This is a special-purpose neural network which is used widely for problems involving sequences [92], e.g., time series or natural language. A recurrent network can be pictured as a model which takes two inputs for every time step t. One of these inputs, x (t) , is external, coming from a data source or another model. The other input is an internal state h (t) , which comes from the same network, but at a previous time-step (hence the name recurrent). These inputs are processed through a neural network f θ (x (t) , h (t) ), and an output y (t) is (optionally) returned. Similar to a convolutional network, the recurrent architecture encodes translation symmetry into the weights of the model. However, instead of spatial translation symmetry, recurrent models have time translation symmetry. In terms of the network architecture, this means that the model reuses the same weights matrix W and bias vector b in every layer. In general, W or b are unrestricted, though more specialized architectures could also further restrict these.
This architecture generalizes straightforwardly to quantum neural networks, with the inputs, outputs, and internal states employing any of the data-encoding schemes discussed earlier. It is particularly well-suited to an optical implementation, since we can connect the output modes of a quantum circuit back to the input using optical fibres. This allows the same quantum optical circuit to be reused several times for the same model. We can reserve a subset of the modes for the data input and output channels, with the remainder used to carry forward the internal state of the network between time steps.
Residual network. The residual network [93], or resnet, is a more recent innovation than the convolutional and recurrent networks. While these other models are special cases of feedforward networks, the resnet uses a modified network topology. Specifically, 'shortcut connections,' which perform a simple identity transformation, are introduced between layers. Using these shortcuts, the output of a layer can be added to its input. If a layer by itself would perform the transformation F, then the corresponding residual network performs the transformation To perform residual-type computation in a quantum neural network, we look back to Eq. (28), where a two- where ϕ is some desired non-Gaussian function. To complete the residual computation, we need to sum these two values together. This can be accomplished using the controlled-X (or SU M ) gate C X [43], which can be carried out with purely Gaussian operations, namely squeezing and beamsplitters [94]. Adding a C X gate after the transformation in Eq. (32), we obtain which is a residual transformation. This residual transformation can also be carried out on arbitrary wavefunctions ψ(x) in superposition, giving the general mapping

IV. NUMERICAL EXPERIMENTS
We showcase the power and versatility of CV quantum neural networks by employing them in a range of machine learning tasks. The networks are numerically simulated using the Strawberry Fields software platform [95] and the Quantum Machine Learning Toolbox app which is built on top of it. We use both automatic differentiation with respect to the quantum gate parameters, which is built into Strawberry Fields' TensorFlow [20] quantum circuit simulator, as well as numerical algorithms to train these networks. Automatic differentiation techniques allow for a direct use of established optimization algorithms based on stochastic gradient descent. On the other hand, numerical techniques such as the finite-difference method or Nelder-Mead will allow training of hardware-based implementations of quantum neural networks.
We study several tasks in both supervised and unsupervised settings, with varying degrees of hybridization between quantum and classical neural networks. Some cases employ both classical and quantum networks whereas others are fully quantum. The architectures used are illustrated in Fig. 4. Unless otherwise stated, we employ the Adam optimizer [96] to train the networks and we choose the Kerr gate K(κ) = exp(iκn 2 ) as the non-Gaussian gate in the quantum networks. Our results highlight the wide range of potential applications of CV quantum neural networks, which will be further enhanced when deployed on dedicated hardware which exceeds the current limitations imposed by classical simulations.

A. Training quantum neural networks
A prototypical problem in machine learning is curve fitting: learning a given relationship between inputs and outputs. We will use this simple setting to analyze the behaviour of CV quantum neural networks with respect to different choices for the model architecture, cost function, and optimization algorithm. We consider the simple case of training a quantum neural network to reproduce the action of a function f (x) on one-dimensional inputs x, when given a training set of noisy data. This is summarized in Fig. 4(a). We encode the classical inputs as position-displaced vacuum states D(x)|0 , where D(x) is the displacement operator and |0 is the singlemode vacuum. Let |ψ x be the output state of the circuit given input D(x)|0 . The goal is to train the network to produce output states whose expectation value for the quadrature operatorx is equal to f (x), i.e., to satisfy the relation ψ x |x|ψ x = f (x) for all x.
To train the circuits, we use a supervised learning setting where the training and test data are tuples (x i , f (x i )) for values of x i chosen uniformly at random in some interval. We define the loss function as the mean square error (MSE) between the circuit outputs and the desired function values To test this approach in the presence of noise in the data, we consider functions of the formf (x) = f (x) + ∆f where ∆f is drawn from a normal distribution with zero mean and standard deviation . The results of curve fitting on three noisy functions are illustrated in Fig. 5.
Avoiding overfitting. Ideally, the circuits will produce outputs that are smooth and do not overfit the noise in the data. CV quantum neural networks are inherently adept at achieving smoothness because quantum states that are close to each other cannot differ significantly in their expectation value with respect to observables. Quantitatively, Hölder's inequality states that for any two states ρ and σ it holds that for any operator X. This smoothness property of quantum neural networks is clearly seen in Fig. 5, where the input/output relationship of quantum circuits gives rise to smooth functions that are largely immune to the presence of noise, while still being able to generalize from training to test data. We found that no regularization mechanism was needed to prevent overfitting of the problems explored here. Improvement with depth. The circuit architecture is defined by the number of layers, i.e., the circuit depth. Fig. 6 (top) studies the effect of the number of layers on the final value of the MSE. A clear improvement for the curve fitting task is seen for up to six layers, at which point the improvements saturate. The MSE approaches the square of the standard deviation of the noise, 2 = 0.01, as expected when the circuit is in fact reproducing the input-output relationship of the noiseless curve.
Quantum device imperfections. We also study the effect of imperfections in the circuit, which for photonic quantum computers is dominated by photon loss. We model this using a lossy bosonic channel, with a loss parameter η. Here η = 0% stands for perfect transmission (no photon loss). The lossy channel acts at the end of each individual layer, ensuring that the effect of photon loss increases with circuit depth. For example, a circuit with six layers and loss coefficient η = 10% experiences a total loss of 46.9%. The effect of loss is illustrated in Fig. 6 (bottom) where we plot the MSE as a function of η. The quality of the fit exhibits resilience to this imperfection, indicating that the circuit learns to compensate for the effect of losses.
Optimization methods. We also analyze different optimization algorithms for the sine curve-fitting problem. Fig. 7 compares three numerical methods and two methods based on automatic differentiation. Numerical SGD approximates the gradients with a finite differences estimate. Nelder-Mead is a gradient-free technique, while the sequential least-squares programming (SLSQP) method solves quadratic subproblems with approximate gradients. These latter two converge significantly slower, but can have advantages in smoothness and speed per iteration. The Adam optimizer with adaptive learning rate performed better than vanilla SGD in this experiment.
Penalties and regularization. In the numerical simulations of quantum circuits, each qumode is truncated to a given cutoff dimension in the infinite-dimensional Hilbert space of Fock states. During training, it is possible for the gate parameters to reach values such that the output states have significant support outside of the truncated Hilbert space. In the simulation, this results in unnormalized output states and unreliable computations. To address this issue, we add a penalty to the loss function that penalizes unnormalized quantum states. Given a set of output states {|ψ xi }, we define the penalty function where Π H is a projector onto the truncated Hilbert space of the simulation. This function penalizes unnormalized states whose trace is different to one. The overall cost function to be minimized is then where γ > 0 is a user-defined hyperparameter.
An alternate approach to the trace penalty is to regularize the circuit parameters that can alter the energy of the state, which we refer to as the active parameters. Fig. 8 compares optimizing the function of Eq. (37) without any penalty (first column from the left), imposing an L2 regularizer (second column), using an L1 regularizer (third column), and using the trace penalty (fourth column). Without any strategy to keep the parameters small, learning fails due to unstable simulations: the trace of the state drops in fact to 0.1. Both regularization strategies as well as the trace penalty manage to bring the loss function to almost zero within a few steps while maintaining the unit trace of the state. However, there are interesting differences. While L2 regularization decreases the magnitude of the active parameters, L1 regularization dampens all but two of them. The undamped parameters turn out to be the circuit parameters for the nonlinear gates in layer 3 and 4, a hint that these nonlinearities are most essential for the task. The trace penalty induces heavy fluctuations in the loss function for the first 20 steps, but finds parameters that are larger in absolute value than those found by L2 regularization, with a lower final loss.

B. Supervised learning with hybrid networks
Classification of data is a canonical problem in machine learning. We construct a hybrid classical-quantum neural network as a classifier to detect fraudulent transactions in credit card purchases. In this hybrid approach, a classical neural network is used to control the FIG. 8. Cost function and circuit parameters during 60 steps of stochastic gradient descent training for the task of fitting the sine function from Fig. 6. The active parameters are plotted in orange, while all others are plotted in purple. As hyperparameters, we used an initial learning rate of 0.1 which has an inverse decay of 0.25, a penalty strength γ = 10, a regularization strength of 0.5, batch size of 50, a cutoff of 10 for the Hilbert-space dimension, and randomly chosen but fixed initial circuit parameters.
gate parameters of the quantum network, the output of which determines whether the transactions are classified as genuine or fraudulent. This is illustrated in Fig. 4(b).
Data preparation. For the experiment, data was taken from a publicly available database of labelled historical credit card transactions which are flagged as either fraudulent or genuine [97]. The data is composed of 28 features derived through a principal component analysis of the raw data, providing an anonymization of the transactions. Of the 284, 807 provided transactions, only 0.172% are fraudulent. We create training and test datasets by splitting the fraudulent transactions in two and combining each subset with genuine transactions. For the training dataset, we undersample the genuine transactions by randomly selecting them so that they outnumber the fraudulent transactions by a ratio of 3 : 1. This undersampling is used to address the notable asymmetry in the number of fraudulent and genuine transactions in the original dataset. The test dataset is then completed by adding all the remaining genuine transactions.
Hybrid network architecture. The first section of the network is composed of a series of classical fully connected feedforward layers. Here, an input layer accepts the first 10 features. This is followed by two hidden layers of the same size and the result is output on a layer of size 14. An exponential linear unit (ELU) was used as the nonlinearity. The second section of our architecture is a quantum neural network consisting of two modes initially in the vacuum. An input layer first operates on the two modes. The input layer omits the first interferometer as this has no effect on the vacuum qumodes. This results in the layer being described by 14 free pa-rameters, which are set to be directly controlled by the output layer of the classical neural network. The input layer then feeds onto four hidden layers with fully controllable parameters, followed by an output layer in the form of a photon number measurement. An output encoding is fixed in the Fock basis by post-selecting on single-photon outputs and associating a photon in the first mode with a genuine transaction and a photon in the second mode with a fraudulent transaction.
Training. To train the hybrid network, we perform SGD with a batch size of 24. Let p be the probability that a single photon is observed in the mode corresponding to the correct label for the input transaction. The cost function to minimize is where p i is the probability of the single photon being detected in the correct mode on input i. The probability included in the cost function is not post-selected on single photon outputs, meaning that training learns to output a useful classification as often as possible. We perform training with a cutoff dimension of 10 in each mode for approximately 5 × 10 4 batches. Once trained, we use the probabilities post-selected on single photon events as classification, which could be estimated experimentally by averaging the number of single-photon events occurring across a sequence of runs.
Model performance. We test the model by choosing a threshold probability required for transactions to be classified as genuine. The confusion matrix for a threshold of p th = 0.61 is given in Fig. 9. By varying the classification threshold, a receiver operating characteristic (ROC) curve can be constructed, where each point in the curve is parametrized by a value of the threshold. This is shown in Fig. 9, where the true negative rate is plotted against the false negative rate. An ideal classifier has a true negative rate of 1 and a false negative rate of 0, as illustrated by the circle in the figure. Conversely, randomly guessing at a given threshold probability results in the dashed line in the figure. Our classifier has an area under the ROC curve of 0.963, compared to the optimal value of 1. For detection of fraudulent credit card transactions, it is imperative to minimize the false negative rate (bottom left square in the confusion matrix of Fig. 9), i.e., the rate of misclassifying a fraudulent transaction as genuine. Conversely, it is less important to minimize the false positive rate (top right square) -these are the cases of genuine transactions being classed as fraudulent. Such cases can typically be addressed by sending verification messages to cardholders. The larger false positive rate in Fig. 9 can also be attributed to the large asymmetry between the number of genuine and fraudulent data points.
The results here illustrate a proof-of-principle hybrid classical-quantum neural network able to perform classification for a problem of genuine practical interest. While it is simple to construct a classical neural network to outperform this hybrid model, our network is restricted in both width and depth due to the need to simulate the quantum network on a classical device. It would be interesting to further explore the performance of hybrid networks in conjunction with a physical quan-tum computer.
C. Generating images from labeled data Next, we study the problem of training a quantum neural network to generate quantum states that encode grayscale images. We consider images of N × N pixels specified by a matrix A whose entries a ij ∈ [0, 1] indicate the intensity of the pixel on the ith row and jth column of the picture. These images can be encoded into twomode quantum states |A by associating each entry of the matrix with the coefficients of the state in the Fock basis: where N = N −1 i,j=0 |a ij | 2 is a normalization constant. We refer to these as image states. The matrix coefficients a ij are the probability amplitude of observing i photons in the first mode and j photons in the second mode. Therefore, given many copies of a state |A , the image can be statistically reconstructed by averaging photon detection events at the output modes. This architecture is illustrated in Fig. 4(c).
Image encoding strategy. Given a collection of images A 1 , A 2 , . . . , A n , we fix a set of input two-mode coherent states |α 1 |β 1 , |α 2 |β 2 , . . . , |α n |β n . The goal is to train the quantum neural network to perform the transformation |α i |β i → |A i for all i = 1, 2, . . . , n. Since the transformation is unitary, the Gram matrix of input and output states must be equal, i.e., it must hold that for all i, j.
In general, it is not possible to find coherent states that satisfy this condition for arbitrary collections of output states. To address this, we consider output states with support in regions of larger photon number and demand that their projection onto the image Hilbert space of at most N − 1 photons in each mode coincides, modulo normalization, with the desired output states. Mathematically, if V is the unitary transformation performed by the quantum neural network, the goal is to train the circuit to produce output states V|α i |β i such that where Π N = N −1 i,j=0 |i i| ⊗ |j j| is a projector onto the Hilbert space of at most N − 1 photons in each mode and p i = Tr[Π N V|α i α i | ⊗ |β i β i |V † ] is the probability of observing the state in the subspace defined by this projector. The quantum neural network therefore needs to learn not only how to transform input coherent states into image states, it must also learn to employ the additional dimensions in Hilbert space to satisfy the constraints imposed by unitarity. This approach still allows us to retrieve the encoded image by performing photon counting, albeit with a penalty of p i in the sampling rate.
As an example problem, we select a database of 4 × 4 images corresponding to the seven standard configurations of four blocks used in the digital game Tetris. These configurations are known as tetrominos. For a fixed value of the parameter α > 0, the seven input states are set to each of which must be mapped to the image state of a corresponding tetromino.
Training. We define the states i.e., |Ψ i is the output state of the network and |ψ i is the normalized projection of the output state onto the image Hilbert space of at most 3 photons in each mode.
To train the quantum neural network, we define the cost function where |A 1 , |A 2 , . . . , |A 7 are the image states of the seven tetrominos, P is the trace penalty as in Eq. (37) and we set γ = 100. By choosing this cost function we are forcing each input to be mapped to a specific image of our choice. In this sense, we can view the images as labeled data of the form (|ϕ i , |A i ) where the label specifies which input state they correspond to. We employed a network with 25 layers (see Fig. 4(c)) and fixed a cutoff of 11 photons in the numerical simulation, setting the displacement parameter of the input states to α = 1.4.
Model performance. The resulting image states are illustrated in Fig. 10, where we plot the absolute value squared of the coefficients in the Fock basis as grayscale pixels in an image. Tetrominos are referred to in terms of the letter of they alphabet they resemble. We fixed the desired output images according to the sequence 'LOTISJZ' such that the first input state is mapped to the tetromino 'L', the second to 'O', and so forth. Fig. 10 clearly illustrates the role of the higherdimensional components of the output states in satisfying the constraints imposed by unitarity: the network learns not only how to reproduce the images in the smaller Hilbert space but also how to populate the remaining regions in order to preserve the pairwise overlaps between states. For instance, the input states |ϕ 1 and |ϕ 2 are nearly orthogonal, but the images of the 'L' and 'O' tetrominos have a significant overlap. Consequently, the network learns to assign a relatively small probability of projecting onto the image space while populating the higher photon sectors in orthogonal subspaces. Overall, the network is successful in reproducing the images in the space of a few photons, precisely as it was intended to do.

D. Hybrid quantum-classical autoencoder
In this example, we build a joint quantum-classical autoencoder (see Fig. 4(d)). Conventional autoencoders are neural networks consisting of an encoder network followed by a decoder network. The objective is to train the network to act as an identity operation on input data. During training, the network learns a restricted encoding of the input data -which can be found by inspecting the small middle layer which links the encoder and decoder. For the hybrid autoencoder, our goal is to find a continuous phase-space encoding of the first three Fock states |0 , |1 , and |2 . Each of these states will be encoded into the form of displaced vacuum states, then decoded back to the correct Fock state form.
Model architecture. For the hybrid autoencoder, we fix a classical feedforward architecture as an encoder and a sequence of layers on one qumode as a decoder, as shown in Fig. 4(d). The classical encoder begins with an input layer with three dimensions, allowing for any real linear combination in the {|0 , |1 , |2 } subspace to be input into the network. The input layer is followed by six hidden layers of dimension five and a two-dimensional output layer. We use a fully connected model with an ELU nonlinearlity.
The two output units of the classical network are used to set the x and p components of a displacement gate acting on the vacuum in one qumode. This serves as a continuous encoding of the Fock states as displaced vacuum states. In fact, displaced vacuum states have Gaussian distributions in phase space, so the network has a resemblance to a variational autoencoder [98]. We employ a total of 25 layers with controllable parameters. The goal of the composite autoencoder is to physically generate the Fock state originally input into the network. Once the autoencoder has been trained, by removing the classical encoder we are left with a method to generate Fock states by varying the displacement of the vacuum. Notably, there is no need to specify which displacement should be mapped to each Fock state: this is automatically taken care of by the autoencoder.
Training. Our hybrid network is trained in the following way. For each of the Fock states |0 , |1 , and |2 , we input the corresponding one-hot vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1) into the classical encoder. Suppose that for an input |i the encoder outputs the vector (x i , p i ). This is used to displace the vacuum in one mode, i.e., enacting D(α i )|0 with α i = (x i , y i ). The output of the quantum decoder is the quantum state |Ψ i = VD(α i )|0 , with V the unitary resulting from the layers. We define the normalized projection onto the subspace of the first three Fock states, with Π 3 being the corresponding projector. As we have discussed previously, this allows the network to output the state |ψ i probabilistically upon a successful projection onto the subspace. The objective is to train the network so that |ψ i is close to |i , where closeness is measured using the fidelity | i|ψ i | 2 . As before, we introduce a trace penalty and set a cost function given by with γ = 100 for the regularization parameter. Additionally, we constrain the displacements in the input phase space to a circle of radius |α| = 1.5 to make sure the encoding is as compact as possible.
Model performance. After training, the classical encoder element can be removed and we can analyze the quantum decoder by varying the displacements α applied to the vacuum. Fig. 11 illustrates the resulting performance by showing the maximum fidelity between the output of the network and each of the three Fock states used for training. For the three Fock states |0 , |1 , and |2 , the best matching input displacements each lead to a decoder output state with fidelity of 99.5%.
The hybrid network has learned to associate different areas of phase space with each of the three Fock states used for training. It is interesting to investigate the resultant output states from the quantum network when the vacuum is displaced to intermediate points between the three areas. These displacements can result in states that exhibit a transition between the Fock states. We use the wavefunction of the output states to visualize this transition. We plot on the right-hand side of Fig. 11 the output wavefunctions which give best fidelity to each of the three Fock states |0 , |1 , |2 , respectively. Wavefunctions are also plotted for displacements which are the intermediate points between those corresponding to: |0 and |1 ; |0 and |2 ; and |1 and |2 , respectively. These plots illustrate a smooth transition between the encoded Fock states in phase space.

V. CONCLUSIONS
We have presented a quantum neural network architecture which leverages the continuous-variable formalism of quantum computing, and explored it in detail through both theoretical exposition and numerical experiments. This scheme can be considered as an analogue of recent proposals for neural networks encoded using classical light [31], with the additional ingredient that we leverage the quantum properties of the electromagnetic field. Interestingly, as light-based systems are already used in communication networks (both classical and quantum), an optical CV neural network could be wired up directly to communication channels, allowing us to avoid the costly interconversion of classical and quantum information.
We have proposed variants for several well-known classical neural networks, specifically fully connected, convolutional, recurrent, and residual networks. We envision that in future work specialized neural networks will also be inspired purely from the quantum side. We have numerically analyzed the performance of quantum neural network models and demonstrated that they show promise in the tasks we considered. In several of these examples, we employed joint architectures, where classical and quantum networks are used together. This is another promising direction for future exploration, in particular given the current technological lead of classical computers and the expectation that near-term quantum hardware will be limited in size. The quantum part of the model can be specialized to process classically difficult parts of a larger computational to which it is naturally suited. In the longer term, as larger-scale quantum computers are built, the quantum component could take a larger role in hybrid models. Finally, it would be a fruitful research direction to explore the role that fundamental quantum physics concepts -such as symmetry, interference, entanglement, and the uncertainty principle -play in quantum neural networks more deeply.

ACKNOWLEDGMENTS
We thank Krishna Kumar Sabapathy, Haoyu Qi, Timjan Kalajdzievski, and Josh Izaac for helpful discussions. SL was supported by the ARO under the Blue Sky program.

Appendix A: Linear interferometers
In this section, we derive Eq. (20) for the effect of a passive interferometer on the eigenstates |x . A simple expression for an eigenstate of thex quadrature with eigenvalue x can be found in Appendix 4 of Ref. [99] |x = π −1/4 exp − 1 2 is the bosonic annihilation operator, and |0 is the single mode vacuum state. The last expression is independent of any prefactors used to define the quadrature operatorx in terms ofâ andâ † .
This can be easily generalized to N modes: where now and |0 is the multimode vacuum state. Now consider a (passive) linear optical transformation Û In general, U is an arbitrary unitary matrix, U U † = 1 N .
We will however restrict U to have real entries and thus to be orthogonal. In this case, U † = U T and hence We can now examine how the multimode state |x transforms under such a linear interferometer U: We can use the transformation in Eq. (A5) to write |0 .
(A8) Now we use that U T U = U U T = 1 N to write the last expression as |0 .

(A9)
Let us define the vector y = U T x and, to match the notation of Eq. (20), the orthogonal matrix C = U T , in terms of which we find Note that the output state is also a product state. This simple product transformation is a corollary of the elegant results of Ref. [100]: "Given a nonclassical pureproduct-state input to an N -port linear-optical network, the output is almost always mode entangled; the only exception is a product of squeezed states, all with the same squeezing strength, input to a network that does not mix the squeezed and antisqueezed quadratures." In our context the x eigenstates are nothing but infinitely squeezed states and the fact that our passive linear optical transformation is orthogonal immediately implies that squeezed and antisqueezed quadratures are not mixed. picture, the symplectic transformation M H generated by H is obtained via the rule [49] M H = exp(Ω H), where Ω is the symplectic form from Eq. (8).
We now fix H to be translationally invariant, i.e., H does not change under the transformation where we have introduced the shift operator T which mapsx i →x i+1 andp i →p i+1 . We assume periodic boundary conditions on the modes,x N →x 1 andp N → p 1 , which allows us to represent translation as an N ×N orthogonal matrix: The translationally-invariant condition on H translates to the statement that Writing this matrix in a block form, we conclude that we must also have [M uv , T ] = 0 (B9) for each u, v ∈ {x, p}. Expressing this in the equivalent form we see that the following condition must hold on the entries of each M uv : In other words, when the generating Hamiltonian is translationally invariant, each block of the corresponding symplectic matrix is a Toeplitz matrix, which implements a one-dimensional convolution.