Efficient high-dimensional entanglement imaging with a compressive sensing, double-pixel camera

We implement a double-pixel, compressive sensing camera to efficiently characterize, at high resolution, the spatially entangled fields produced by spontaneous parametric downconversion. This technique leverages sparsity in spatial correlations between entangled photons to improve acquisition times over raster-scanning by a scaling factor up to n^2/log(n) for n-dimensional images. We image at resolutions up to 1024 dimensions per detector and demonstrate a channel capacity of 8.4 bits per photon. By comparing the classical mutual information in conjugate bases, we violate an entropic Einstein-Podolsky-Rosen separability criterion for all measured resolutions. More broadly, our result indicates compressive sensing can be especially effective for higher-order measurements on correlated systems.


INTRODUCTION
Spatially entangled biphotons, such as those generated by spontaneous parametric downconversion (SPDC), exhibit strong Einstein-Podolsky-Rosen (EPR) type correlations [1] in the transverse position and transverse momentum degrees of freedom [2]. Because these variables are continuous, the entanglement can be very highdimensional with a typical Schmidt number greatly exceeding 1000 [3]. This provides high information density which can be leveraged to increase channel capacity and security for quantum key distribution [4][5][6] and dense coding [7,8]. Other applications include ghost imaging [9,10], quantum computing [11], and quantum teleportation [12].
Experimentally characterizing the SPDC state is unfortunately difficult due to weak sources and low resolution detectors. Spatial entanglement is traditionally imaged by jointly raster-scanning photon-counting avalanche photodiodes (APDs) to measure spatial correlations. This scales extremely poorly with increasing detector resolution. With a biphoton flux of 4, 000 coincident detections per second, it would take 55 days to jointly scan a 24 × 24 pixel region for a signal-to-noise ratio (SNR) of 10. For 32 × 32 pixels, it would take 310 days (see Eq. 9).
Other approaches have been tried with mixed success. Intensified CCD cameras can measure the Schmidt number [13], but do not detect single-photon correlations rendering them ineffective for most quantum applications. Arrays of photon counting APDs could replace CCDs, but they are currently low resolution, noisy and resource intensive; especially since each pixel pair must be individually correlated [14][15][16]. A recent, promising result averages intensity correlations over many images from a single-photon sensitive electron-multiplying CCD, reporting 2500 modes [17]. This technique is limited to a 30 ms exposure time (APDs are sub-ns) and is noisier than using APDs because it does not isolate individual coincident detections.
In ref [18], Dixon et. al. reduce the number of measurements required for a raster-scan by only measuring in an area of interest where correlations are expected, reporting a channel capacity of 7 bits per photon. While not a true full-field measurement, they highlight a critical feature of the SPDC field. In both position and momentum representations, the distribution of correlations between pairs of detector pixels is very sparse despite dense (not sparse) single-particle distributions. Applying ideas from the field of compressive sensing, we exploit prior knowledge of this sparsity to beat the "curse of dimensionality" [19] and efficiently characterize the full biphoton field without raster scanning.
In this article, we implement a compressive sensing, photon-counting double-pixel camera that efficiently images single-photon SPDC correlations in the near-and far-field at resolutions up to 32 × 32 = 1024 dimensions per detector. At 32×32 resolution, the measurement time is reduced from 310 days for raster scanning to around 8 hours. We perform an entropic characterization showing channel capacities of up to 8.4 bits per photon, equivalent to 337 independent, identically distributed modes. Sums of channel capacities in conjugate bases violate a EPR steering bound [20] by up to 6.6 bits.

THEORY Compressive Sensing
Compressive sensing is a technique that employs optimization to measure a sparsely represented Ndimensional signal from M < N incoherent measurements [21][22][23][24]. The approach is so-named because the arXiv:1212.5530v1 [quant-ph] 21 Dec 2012 signal is effectively compressed during measurement. Though sparsity is assumed, it is not known prior to measurement which elements contain appreciable amplitude. Compressive sensing must determine both which elements are significant and find their values.
To detect a sparsely represented N -dimensional signal vector X, we measure a series of M < N values Y by multiplying X by an M × N sensing matrix A such that where Γ is a noise vector. Because M < N , this system is undetermined; a given Y does not specify a unique X. The correct X is recovered by minimizing a regularized least squares objective function where for example ||Ω|| 2 2 is the 2 norm of Ω and τ is a scaling constant. The function g(X) is a regularization promoting sparsity. Common g(X) include X's 1 norm, assuming the signal is sparse, and X's total variation, assuming the signal's gradient is sparse [25]. A must be incoherent with the basis of interest, with the surprising and non-intuitive result that a random, binary sensing matrix works well. Given sufficiently large M , the recovered X approaches the exact signal with high probability [26]. For a k−sparse signal, the required M scales as M ∝ k log(N/k).
Incoherent, random sampling is particularly beneficial for low-light measurements as each measurement receives on average half the total photon flux Φ/2, as apposed to Φ/n for a raster scan. Compressive sensing is now beginning to be used for quantum applications such as state tomography [27]. Shabani et. al, for example, perform a tomography on a two qubit photonic gate for polarization entangled photons [28]. CS has also been used to with spatially correlated light for ghost imaging [29,30]. It is important to note that for ghost imaging, CS is not required to recover the full two-particle probability distribution as in entanglement characterization.
The quintessential compressed sensing example is the single-pixel camera [31,32]. An object is imaged onto a Digital Micromirror device (DMD), a 2D binary array of individually-addressable mirrors that reflect light either to a single detector or a dump. Rows of the sensing matrix A consist of random, binary patterns placed sequentially on the DMD. For an N -dimensional image, minimizing Eq. 2 recovers images using as few as M = 0.02N measurements.

Compressive Sensing for Measuring Correlations
The single-pixel camera concept naturally adapts to imaging correlations by adding a second detector. Consider placing separate DMDs in the near-field or far-field of the SPDC signal and idler modes, where "on" pixels are redirected to photon counting modules. The signal of interest is where p(u, v) represents the probability of a coincident detection between the u th pixel on the signal DMD and v th pixel on the idler DMD. The functions f ( x s , x i ) and f ( k s , k i ) are approximate position and momentum wavefunctions for the biphoton Subscripts s and i refer to signal and idler photons respectively, σ p and σ c are the pump and correlation widths, and N is a normalizing constant. X of Eq. 2 is simply a one-dimensional reshaping of p x or p k . Like the single-pixel camera, a series of random patterns are placed on the DMDs to form rows of A. For each pair of patterns, correlations between the signal and idler photons form the measurement vector Y . Minimization of Eq. 2 recovers p(u, v).
While a fully random A is preferred, the DMDs only act on their respective signal or idler subspace, which prevents arbitrary A. Rows of A are therefore outer products of rows of single-particle sensing sensing matrices a and b where rows of a represent random patterns placed on the signal DMD and rows of b represent random patterns placed on the idler DMD. To make signal and idler photons distinguishable, a and b are not the same. The validity of Kronecker-type sensing matrices has been established and is of current interest in the CS community as attention shifts to higher dimensional signals [33,34]. The measurement vector Y is obtained by counting coincident detections for the series of DMD configurations given by A.
A variety of reconstruction algorithms exist for Eq. 2, with their computational complexity dominated by repeatedly calculating AX and A T Y [35]. This is especially unwieldy for correlation measurements as the size of A is M × n 2 for n−pixel DMDs. Using properties for Kronecker products [36], these can be more efficiently computed by where sq and vec reshape a vector to a square matrix and vice-versa; diag forms a vector from the diagonal elements of a square matrix; and od forms a square matrix placing the operand vector on its diagonal.

Comparison to Raster Scanning
The compressive approach finds the joint probability distribution orders of magnitude faster than raster scanning through two key improvements. The first is simply the reduction in the number of measurements. To jointly raster scan an n-pixel space requires n 2 measurements. For a compressive measurement, sparsity is approximately n with dimensionality n 2 , so only M ∝ n log(n) measurements are required. In practice, we found excellent results when M was only three percent of n 2 .
The second advantage of compressive measurements is that they more efficiently use available flux. For the raster scan, the total flux is distributed over at best n pairs of pixels in the case of perfect correlations. Conversely, the average flux per incoherent compressive measurement is independent of n, with each measurement receiving on average 1/4 the total flux. To maintain constant SNR (photons/measurement) with increasing n, total measurement time therefore scales as n 3 for raster scanning. Given a photon flux of Φ photons per second, the measurement time for a desired SNR is where t meas is the time per measurement. For incoherent, compressive measurements, acquisition time scales as n log(n). The compressive improvement therefore scales as n 2 / log(n). For n = 32 × 32 = 1024, this is of order 10 5 .
This scaling factor somewhat optimistically assumes the reconstruction process yields an accurate result despite a noisy signal. Unfortunately, propagation of uncertainty through the reconstruction process remains a difficult problem, especially for non-ideal, real world systems [37]. There has been much recent theoretical work on the topic for Gaussian [38][39][40] and Poissonian noise [41,42]. These results tend to require ideal sensing matrices or more complicated formulations to give provable performance bounds. As such, their findings are difficult to directly, quantitatively apply to experiment. However, they do reveal pertinent features that indicate CS can perform extremely well in the presence of noise.
A well known characteristic of CS is a rapid phase change from poor to good quality reconstructions [43]. This phase change is often discussed as a function of increasing m, with the boundary m ∝ k log(n/k). A similar phase transition occurs as a function of the noise level; in our case, this is the number of photons per measurement. For some cases, these two are even linked [38]. A practical compressive measurement simply requires large enough m and photon flux Φ to be in the space of good reconstructions. Fortunately, simply obtaining a recognizable reconstruction generally indicates the measurement conditions exceed this threshold.
Unlike a direct measurement, the information obtained by a series of y compressive measurements is contained in their deviation from the average valueȳ. In the presence of noise, these deviations must exceed the noise level. Assuming Poissonian shot noise, good reconstructions require std(y) ≥ β √ȳ , where std(y) is the standard deviation of the measurement vector and β is a positive constant greater than one.
The particular algorithm chosen to solve Eq. 2 also plays a role in the reconstruction"s accuracy. These often have provable performance on ideal signals, but degrade when confronted with noisy or otherwise non-ideal conditions. In these circumstances, they have various   strengths, including speed, accuracy, and sensitivity to user selected parameters such as τ in Eq. 2. For more information on common reconstruction algorithms, see Refs [24,35,44,45]. In practice, the best way to determine accuracy for a particular signal, sensing matrix, and reconstruction approach remains repeated simulations or experiments. For our system, we reduce a n = 32 × 32 measurement from a 310 day raster scan (SNR of 10) to an 8 hour compressive acquisition, a thousand-fold improvement.

EXPERIMENT
The experimental apparatus is given in Fig. 1. Light from a 2.8 mW, 325 nm HeCd laser was directed to a 1 mm long BiBO crystal oriented for type I, collinear SPDC. The generated daughter photons passed through a 650/13 nm narrowband filter before separating into signal and idler modes at a 50/50 beamsplitter. To measure position-position correlations, lenses f 1 = 125 mm and f 2 = 500 mm imaged the crystal onto signal and idler mode DMDs. For momentum-momentum correlations, f 1 was removed and the DMDs were placed in the focal plane of f 2 = 88.3 mm. DMD "on" pixels reflected light to large area, single photon counting modules (SPCM) connected to a correlating circuit.
To measure p(u, v), a series of M random patterns were placed on the DMDs to form the sensing matrix A. For each set of patterns, joint detections were counted for acquisition times t aq for a total measurement time t = M t aq to make up the measurement vector y. The joint distribution p(u, v) was reconstructed using a Gradient Projection solver for Eq. 2 with 1 regularization, commonly referred to as Basis Pursuit Denoising [44].
We measured at dimensions of N = 256 2 , N = 576 2 , and N = 1024 2 corresponding to DMD resolutions of 16 × 16, 24 × 24, and 32 × 32 pixels. The associated measurement numbers M were 2500, 10, 000, and 30, 000 so that M is only about 0.03N . Acquisition times were 1 second for position measurements and 1.5 seconds for momentum measurements to average 1000 coincident de- tections per DMD configuration in all cases. Additionally, we performed representative simulations at 16 × 16 and 24 × 24 resolutions.

Joint Probability Distribution
A simulation for measuring position-position correlations at 16 × 16 DMD resolution is given in Fig. 2. The object in 2(a) is the correlation function of Eq. 4. The simulation used m = 2500 measurements and a photon flux of Φ = 5000 photons/measurement multiplied by the ideal p(u, v), conditions representative of the 1 second experimental acquisitions. Note that this is the total signal strength before interacting with the sensing matrix; the mean value of the measurement vector is Φ/4 = 1250 detected photons. Values of the measurement vector were Poissonian distributed to simulate the effect of shot noise. Fig. 4(b) gives the reconstructed correlation function p(u, v) between signal and idler DMD pixels. The sharply defined diagonal line shows the expected positive correlations between the two DMDs. DMD pixels are listed in column-major order. The mean squared error (MSE) for the reconstruction was 5 × 10 −8 . The two-dimensional signal marginal distribution is inset, which provides an image of the signal beam. Figs. 2(c) and 2(d) sum the result along the anti-diagonal to show the correlation width σ c . Qualitatively, the reconstruction closely resembles the original object, faltering only near the edges of the distribution where the signal falls beneath a noise floor.
The reconstruction recovers σ c < 1 pixel with negligible error.
To demonstrate the reconstruction accuracy, simulations were performed for increasing photon flux Φ with DMD resolution 16 × 16 and m = 2500. The MSE versus Φ is given in Fig. 3. Reconstructions were normalized to the incident flux Φ for comparison to the ideal signal. The result shows the rapid phase change from poor to excellent reconstructions with a MSE converging to 5 × 10 −8 beyond the phase change.
The MSE can be used to roughly estimate the signalto-noise ratio for a particular measurement of an average, non-zero element. Assuming perfect pixel correlations and uniform marginal distributions, the energy in the signal is distributed over 1/n elements. The signal-tonoise ratio is then 1/n √ MSE. For n = 256 pixels and MSE = 5 × 10 −8 , this yields an approximate SNR of 17. For comparison, using Eq. 9, a raster scan would require about four days to achieve a SNR of only 10. The simulated CS acquisition time was 42 minutes for 2500, 1 second measurements.
Sample experimental reconstructions for positionposition and momentum-momentum correlations at 16 × 16 pixel DMD resolution are given in Fig.4. As in the simulations, the position-position result shows a well defined diagonal line indicating positive pixel correlations. Conversely, the momentum-momentum result shows an anti-diagonal line showing the expected anti-correlations. Figs. 4(c) and 4(d) sum the results along the antidiagonal (position-position) and diagonal (momentummomentum) to reveal an effective correlation width σ ce of a single pixel. Our detection scheme is therefore as accurate as possible at this resolution and our channel capacity remains detector limited.

Mutual Information in the Channel
Once p(u, v) is recovered, the channel capacity is given by the classical mutual information shared between signal and idler DMD pixels; where for example, is the signal particle's marginal probability distribution [18]. This entropic analysis is solely measurement based and does not require reconstructing a wavefunction or density matrix, a challenging task even for lowdimensional systems [46][47][48]. To estimate the uncertainty in the mutual information from shot noise and the reconstruction process, we performed 100 simulations at n = 256 pixel resolution and 31 simulations at n = 576 pixel resolution. Simulations were not performed at n = 1024 pixel resolution due to available computer time. In addition to the results from the raw reconstruction, thresholding was performed to provide noise reduction, where all values in the recovered p(u, v) below a percentage of the maximum value are forced to zero. The simulated mutual information versus thresholding percentage is given in Fig. 5 for the n = 256 pixel simulations exemplified by Fig. 2. Errorbars enclose one standard deviation from repeated simulations.
As the threshold increases from zero, the mutual information rises as a weak, uncorrelated noise floor is removed. An optimal threshold is quickly reached, beyond which the threshold removes more signal than noise, reducing the mutual information. Note that the reconstructed mutual information is systematically lower than the actual mutual information in the ideal object. This is The n = 256 far field experimental result is included for comparison to the simulation. The experiment closely matches the simulation both for no thresholding and beyond its optimal threshold, but is smaller in the intermediate region. This is likely due to experimental errors not included in the simulation. These include slight pixel misalignment between signal and idler DMDs, optical aberrations, detector dark noise, stray light, power fluctuations in the laser, and temperature stability of the nonlinear crystal. Fig. 5 indicates that these experimental difficulties appear to increase the uncorrelated noise floor rather than significantly affect the correlated part of the reconstruction.
Although thresholding is a simple post-processing technique, it is applicable to how the entangled pixels might be used for communication. If a pair of entangled pixels has a correlated amplitude near or below the background noise, it will be difficult to use that particular pair for communication. A communication scheme would likely only consider pixel pairs above a certain threshold to be useful. This is related to the technique in photonic quantum information of subtracting background noise from a measured signal. In CS, it is also common to perform post-processing or secondary optimization after maximiz-ing sparsity, such as the debiasing routine used in Ref. [44].
The experimental channel capacity versus DMD resolution for both position-position and momentummomentum is given in Fig. 6 for several levels of thresholding. The optimal threshold is that which maximizes the mutual information. At 256 and 576 pixel resolutions, optimal thresholds of 20% and 30% were used for position-position and momentum-momentum distributions respectively. At 1024 pixel resolution, noise was more significant, so the optimal thresholds increased to 30% and 40%. Error bars on n = 256 and n = 576 pixels measurements represent the expected effect of shot noise and reconstruction uncertainty derived from simulation. These have been conservatively set to include two standard deviations from the simulated result.
The joint probability distribution was also fit to the double-Gaussian wavefunction (Eq. 5) to find effective widths σ ce and σ pe . When σ p σ c , the mutual information between particles for Eq. 5 is the logarithm of the Federov ratio [49] log where the ratio is squared for two dimensions. While this technically applies to the continuous wavefunction, and the true σ c is smaller than a DMD pixel, Eq. 12 still applies to the discritized measurement so long as the effective σ ce σ pe . Fitting yielded the largest channel capacities with a maximum of 8.4 bits for momentum-momentum correlations at 1024 pixel resolution, equivalent to 337 independent, identically distributed, entangled modes.
Given that fitting more accurately characterizes the system and gives a larger mutual information, it is reasonable to question the usefulness the direct mutual information computation. However, the two approaches suit different purposes. Fitting is useful if one is particularly interested in the state itself. However, if one intends to use correlated pixels for some other purpose, such as communication, the direct calculation is more appropriate. This is because the correlated pixels on the low intensity, tail of the distribution will be difficult to use in practice even if their amplitude can be inferred by fitting.
The solid curve of Fig. 6 gives the maximum possible mutual information between two, n-pixel detectors. Assuming perfect diagonal or anti-diagonal correlations and uniform marginals, this is simply log(n). Because we have Gaussian marginals, we do not expect to reach this bound, even with σ ce ≈ 1 pixel. By magnifying and us-ing only the central part of the field, we could approach this upper limit.

Witnessing Entanglement
Despite not reconstructing a full density matrix, it is still possible to demonstrate non-classical behavior by comparing position-position and momentum-momentum correlation measurements directly. This has traditionally involved fitting the measurements to Eq. 5 and analyzing products or sums of conditional variances [50][51][52].
We recently presented a more inclusive, entropic steering inequality for witnessing continuous variable entanglement with discrete measurements [20], where the sum of the classical mutual information between positionposition and momentum-momentum correlations is classically bounded. For our system, all classically correlated measurements must satisfy I xs,xi + I ks,ki ≤ 2 log nd x d k πe , where d x and d k are the respective widths of DMD pixels in the position and momentum basis. Note that nd x d k is simply the bandwidth product for the DMD area, and is independent of n if total area does not change.
The sum of the classical mutual information in conjugate bases for each detector resolution is given in Fig. 7. The solid blue line provides right hand side of Eq. 13, which must be exceeded to witness EPR steering. Error bars for the n = 256 and n = 576 cases are derived from simulation and include two standard deviations. In all cases, we show EPR steering both with optimal thresholding and fitting to the double Gaussian wavefunction (Eq. 5). Even at 5% thresholding, there is a violation for 16 × 16 dimensions. Recall that simulations (Fig. 3) systematically under-represented the object mutual information relative to measurement uncertainty, so measurement error is highly unlikely to have over-estimated this sum. For the fitted 32 × 32 dimensional result, we violate the classical bound by 6.6 bits.

CONCLUSION
In this article, we present a compressive sensing, double-pixel camera for characterizing the SPCD biphoton state with photon-counting detectors. This technique is very efficient, improving acquisition times over rasterscanning by n 2 / log(n) for n−pixel detectors. We image SPDC correlations at up to 1024 dimensions per detector and measure a detector-limited mutual information of up to 8.4 bits. We also violate an entropic EPR steering bound, indicating that these correlations are nonclassical. More broadly, our results suggest that compressive sensing can be extremely effective for analyzing cor-relations within large dimensional signals (eg. intensityintensity correlations). Potential applications range from verifying security in spectral correlations for energy-time QKD [53] to imaging through scattering media [54]. This work was supported by DARPA InPho grant W911NF-10-1-0404.