Generative adversarial network for super-resolution imaging through a fiber

A multimode fiber represents the ultimate limit in miniaturization of imaging endoscopes. Here we propose a fiber imaging approach employing compressive sensing with a data-driven machine learning framework. We implement a generative adversarial network for image reconstruction without relying on a sample sparsity constraint. The proposed method outperforms the conventional compressive imaging algorithms in terms of image quality and noise robustness. We experimentally demonstrate speckle-based imaging below the diffraction limit at a sub-Nyquist speed through a multimode fiber.


Introduction
Optical fibers are broadly used in many imaging applications. Multimodes fibers (MMFs) together with advanced wavefront shaping [1] enable new ways to transmit information through a hair-thin probe [2,3]. Nowadays, the most popular approaches for imaging through a MMF exploit transmission matrix measurements and holographic light shaping [4][5][6]. On the other hand, computational methods can provide an elegant solution for imaging without the need for active control over the light propagation. Compressive imaging through a MMF improves both spatial resolution and imaging speed [7][8][9]. However, practical application of compressive imaging has been restricted by the strong assumption of sample sparsity and the demand to adjust the approach to different experimental conditions [10].
Recent years have witnessed the rise of deep learning as a powerful tool for computational imaging [11]. In particular, the generative adversarial network (GAN) has made a series of breakthroughs. The GAN is composed of two neural networks that compete with each other during the training process. A properly trained GAN can generate new data belonging to a certain class of objects. Bora et al. proposed to solve compressive sensing problem with a GAN and demonstrated sub-Nyquist imaging without any constraints on the sample sparsity [12]. In this approach, the image is produced by iterative minimization between the GAN-generated images and the measured data. The GAN can be used to produce a clear reconstruction from the noisy l 2 -norm solution [13,14]. Over the past years, this GAN approach for compressive imaging has seen a rapid development [15][16][17].
Recently, deep neural networks have been successfully implemented to classify and reconstruct images distorted by propagating through an MMF [18][19][20][21][22]. However, in contrast to GAN -which is an unsupervised machine learning framework and does not require the labeled data -these machine learning approaches for MMF imaging require to train a network with pairs of input images and their corresponding speckle patterns at the fiber output. As a result, it complicates the pre-calibration system and requires to repeat the training procedure if the fiber configuration changes [23]. Moreover, the application area has currently been restricted to coherent image transmission with an image resolution that is limited by the diffraction of light.
Here we address the problem of speckled based sub-diffraction compressive imaging through a MMF. We propose the GAN-based framework for imaging through a MMF which does not rely on the sparsity constraint and can be easily generalized on any speckle-based measurement system. To the best of our knowledge, this work is the first experimental application of machine learning in speckle-based compressive imaging. We show that proper choice of architecture and loss functions allows for recovering images below the diffraction limit with sub-Nyquist imaging speed via an MMF. We experimentally demonstrate the superiority of our approach over standard l 1 -norm optimization algorithms in terms of spatial resolution, noise tolerance and sensitivity to the number of measurements.

Speckle-based computational imaging
We focus on super-resolution microscopy with a speckle illumination and single-pixel detection scheme also known as super-resolution ghost imaging [8]. The schematic imaging process is shown in Fig. 1(a). The sample is illuminated by a series of speckle patterns generated in a MMF or a scattering medium. The transmitted signal is recorded by a bucket (single-pixel) detector. The relationship between the intensity distribution of the speckle pattern and the bucket detector signal can be represented as a linear equation, and multiple measurements can be formulated as an under-determined system of linear equations: where each flattened speckle pattern (n × 1) corresponds to one row of measurement matrix A (m × n). The response from m different speckle patterns is measured by a bucked detector to form the measurement vector y (m × 1). Finally, x is the 1D unknown vector (n × 1), where n = N 2 is the total number of pixels and N × N is the size of the 2D digitized speckle pattern. In this work, we consider the case where m n. We reconstruct vector x by two computational approaches: compressive sensing algorithm and pre-trained GAN neural network. Reconstructed vector x (n×1) is resized to represent a 2D sample image (N × N ). We compare the results with the diffraction-limited images to evaluate the superresolution and sub-Nyquist capabilities of computational approaches. The reconstruction quality is quantified by the Pearson correlation coefficient r between the reconstructed image and the truth image. The correlation r ranges from −1 to 1, while 1 represents the perfect reconstruction, −1 represents the perfect negative reconstruction and 0 indicates that the reconstruction and ground truth are independent.
Our MMF-based single-pixel imaging setup is presented in Fig. 1(a). A detailed description of the setup and imaging methods is given in Methods section.

Compressive imaging with GAN
The process of reconstruction by GAN consists of two main steps, as presented in Fig. 2: 1. Training step: training the network to generate the sample images corresponding to a certain class of objects, 2. Imaging step: searching for the image that fits the experimentally measured data.
The training step is completely independent of the reconstruction step and a properly trained network can be used for different experimental systems without any change. See Methods section for the detail. A GAN consists of a generator and discriminator, which are simultaneously trained. The generator network performs the transformation from a low-dimensional input latent representation z to the high resolution image, whereas the discriminator network discriminates between the images that are 'real' (from the training set) and 'fake' (newly generated). These two networks learn from and compete with each other during the training process and improve their respective performance. When the GAN network is well-trained, the generator can generate new images belonging to the same class as the training set from an arbitrary input z.
For the image reconstruction, we define a loss function L(z), which can be calculated from the flattened image G(z) combined with measured signal y and measurement matrix A: S im u la te d Following Bora et al. [12], we minimize the loss function which quantifies the discrepancy between the generator image and the experimental data. We randomly generate starting value of the input latent representation z =ẑ and search for the minimum point by gradient descent: where α is the learning rate. Despite the fact that the loss function is highly non-convex, we found that hundreds of iterations combined with several random initiations of the input [12,24] gives a good final reconstructed signal.

Simulations
We first perform simulations to study the performance of the GAN imaging framework in the ideal cases without noise. We use the Basis Persuit (BP) algorithm for comparison and introduce the normalized spatial frequency ν, which is related to the diffraction limit of the optical system. Fig.  1(b) shows examples of generated speckle patterns with different ν values. Both the BP algorithm and normalized spatial frequency ν are explained in detail in Methods section. In Fig. 3(a-d), the results of the noiseless simulations performed on a hand-written digit "3" (randomly selected from the testing dataset) with m = 100 and ν = 0.2 are presented. In these results, we compared the ground truth, the diffraction-limited results and images reconstructed with the BP algorithm and the GAN. The correlation coefficients between the reconstructed images and the ground truth are 0.98 for the GAN, 0.63 for the BP and 0.82 for the diffraction-limited approach.
We studied the performance of the reconstruction algorithms for each of the combinations of different normalized spatial frequencies ν = [0.1, 0.2, 0.3, 0.5, 0.7] and number of measurements m = [10, 40, 70, 100, 200, 300, 400, 500, 750]. Each correlation coefficient between the reconstruction result and the ground truth is averaged over the digits of each kind (0 − 9) randomly selected from the testing dataset. The results are shown in Fig. 3(e-g). In order to clearly show the performance  difference at high r region, we use two color scales. The gray scale corresponds to r range between 0 and 0.9, while the blue-green color scale shows the detailed range between 0.9 and 1. The Nyquist limit, i.e. the minimum number of measurements require for conventional imaging, is shown by the red dashed line. In the case of diffraction-limited imaging, Fig. 3(g), the area below the Nyquist limit is not determined. The GAN achieves a high correlation reconstruction (r ≥ 0.9) with a much lower number of measurements and a lower normalized cutoff-frequency than BP, as shown in Fig. 3(e, f). The correlation coefficient for the GAN quickly saturates to r = 0.98 at ν = 0.3 and m = 80, while the BP algorithm requires ν = 0.45 and m = 450 to reach similar performance. However, the nonconvex nature of the loss function L(z), used to find the optimal solution in the GAN approach, makes it hard to reach the perfect reconstruction (r = 1). For ν ≤ 0.45, both the GAN and BP demonstrate super-resolution imaging, as their image quality is better than the diffraction-limited imaging case. At the same time, both GAN and BP overcome the Nyquist criteria. For instance, at ν = 0.7 we can enhance the imaging speed by roughly 2 and 11 times with BP and GAN respectively.
We used a different class of images to evaluate the transfer learning reconstruction performance and the flexibility of the GAN-based imaging approach. The results for handwritten letters "e" and "x", whose shapes are not close to the digits, are presented in Fig. 4. Therefore, the object is not of the same class as training data, although they are both gray scale and of a handwritten nature. The imaging conditions are the same as Fig. 3(a-d) with ν = 0.2 and m = 100. The GAN approach in these cases show excellent performance demonstrating the flexibility of GAN-based imaging.
We also investigate the performance of the proposed GAN approach under realistic conditions by adding noise on both the measurement matrix A and the measured signal y as described in Section 4.2. For a fair comparison, we replace BP with a Basis Persuit Denoising (BPDN) algorithm, which is optimized for noisy measurements by tuning the error tolerance factor δ. Meanwhile, the GAN does not require any additional parameter tuning. We repeat the reconstruction for different samples, m, ν and the noise levels. The results averaged over the digits of each kind (0 − 9) are shown in Fig. 5.
The image quality of the BPDN algorithm deteriorates quickly with the increasing amount . The gray scale bar shows the correlation range between 0 and 0.9, while the colorbar shows the detailed region from 0.9 to 1. The Nyquist limit is indicated by the red dashed line.
of noise, as can be seen in Fig. 5(a-c). The maximum achievable reconstruction quality of the BPDN decreases to r max = 0.8 for 20% of noise. At the same time, the GAN provides a good reconstruction (with r max up to 0.98) even for the high noise level. The GAN approach maintains super-resolution and sub-Nyquist regimes.

Experimental validation
In order to test our GAN reconstruction framework in practice, we performed an MMF-based compressive imaging experiment on a real sample. Handwritten digits from the testing subset of MNIST database are prepared on a microscope slide using mask-less UV-photolithography (365 nm) and lift-off of sputtered reflective 200 nm thick aluminium film. To generate the ground truth, the separately measured bright-field image of the handwritten digit "4" (randomly selected from the testing dataset) engraved in the aluminum film is cropped and converted to the binary format, as presented in Fig. 6(a). The detailed description of the experimental setup is provided in Section 4.1. Given the diffraction limit of our experimental setup, the effective normalized spatial frequency corresponds to ν ≈ 0.2. The diffraction-limited image, calculated from the ground truth, obtained by applying a lowpass filter with ν = 0.2, is presented in Fig. 6(b). We perform m = [10, 40, 70, 100, 200, 300, 400, 750] random measurements by scanning the input facet of the MMF and randomly choosing m speckle patterns and m corresponding detected intensities to process. The recorded speckle patterns are cropped and resized to match the size of the dataset images. The processed speckle images form measurement matrix A. Figure 6(c-e) shows the imaging results for m = 200 random measurements and the reconstruction performed by the BP algorithm (c), the BPDN algorithm (d), and the proposed GAN (e). The GAN framework provides significantly better image quality compared to diffraction-limited microscopy and traditional compressive sensing algorithms. The GAN approach has much more resemblance to the ground truth and contains a noiseless background.
To characterize the imaging performance, the correlation coefficients between the reconstructed images and the ground truth have been calculated and are presented in Fig. 6(f) for the different number of measurements m. For each m, we repeat the experiment and the reconstruction procedure five times and average the correlation coefficient over all realizations, calculating the mean    and the standard deviation. For the BPDN the error tolerance factor δ is tuned to achieve the best quality of reconstruction for each number of measurements. As expected, for all the algorithms the quality of reconstruction increases with the number of measurements. The GAN approach shows better performance compared to the BP and BPDN algorithms for m < 750, which is in full agreement with the results of the simulation. The BPDN algorithm performs better than BP, which can be explained by the fact that real measurements always contain noise that is not taken into account in the BP formulation Eq. 4. The BPDN demonstrates a slightly better quality of reconstruction compared to the GAN for m = 750, because of the non-convex nature of the GAN loss function.

Discussion and Conclusion
In this study, we experimentally demonstrated compressive fiber imaging with a deep convolutional GAN framework. State-of-the-art methods of machine learning allow to reconstruct images distorted by the MMF only in the case of coherent imaging [19][20][21][22][23]. Whereas the proposed GAN approach has no restrictions and can be used for popular incoherent methods of fiber imaging such as (auto-)fluorescent microscopy. Moreover, the GAN framework does not require to repeat a training procedure with pairs of input and output images for different fiber configurations.
With the proposed GAN approach, we demonstrated fiber-based compressive imaging with enhanced spatial resolution. We have shown that the imaging quality outperforms diffractionlimited imaging approaches. In contrast to conventional minimization algorithms typically used for reconstruction of compressed data with sub-diffraction resolution, our approach does not require the samples to be sparse. The sparsity constraint is very general and can, in principle, be applied to many natural images. However, finding the sparsity domain for a certain sample is not always straightforward. In addition, the imaging quality significantly deteriorates with the increased amount of noise and for lower levels of sample sparsity [25].
We theoretically and experimentally demonstrated that the proposed GAN-based computational framework can achieve an enhanced quality of reconstruction compared to the traditional compressed sensing minimization algorithms. These results are more pronounced for the small number of measurements, while for the large number of measurements the GAN and the traditional BPDN perform similarly. The GAN is also proved to be tolerant to a large amount of noise. Moreover, the GAN in this study is capable of reconstructing images of handwritten letters as well. Hence the well trained GAN can also be applied to reconstruct images from other testing datasets, provided the samples have similar features.
The GAN requires a properly tuned network architecture and the training process of the GAN demands a lot of computational power. The computational complexity can be reduced by either reducing the size of the training set or increasing the size of the latent space. However, both procedures may deteriorate the performance of the generator. In our study the size of the latent space is 100, which is less than an order of magnitude smaller than the overall number of pixels n = 784. We have also done the simulations with the latent space size of 50 and it also gives similar promising results.
No additional training is needed to make the transition from one experimental setup to another, as the GAN is trained to images without experimental imperfections. The approach can be easily generalized to any speckle-based single pixel detection setup of imaging without re-training the GAN. As a result, the proposed GAN approach paves the way towards broad implementation area with a sub-diffraction imaging quality not limited by a sparsity of a sample.

Experimental setup
The experimental setup is illustrated in Fig.1(a). The light from a laser source (532 nm, Cobolt Samba, continuous wave) passes through a half-waveplate and a polarizing beamsplitter cube to control the incident power. The laser beam is reflected by a pair of galvo-mirrors that are projected on the entrance aperture of the objective (NA = 0.75, Olympus) by relay lenses. The objective focuses the beam on the input facet of the MMF (NA = 0.22, diameter, d = 50 µm). The number of modes in the MMF for single polarisation is M modes = 1050. Galvo-mirrors move the focus along the input facet, which excites different sets of propagating modes and produces different output speckle patterns I i . The 54×-magnified image of the speckle pattern is projected on the sample and the camera (pixel size = 2.4 µm, Basler acA 3088-57 um) by the objective (NA = 0.65, Olympus), a tube lens (f = 200 mm) and a beamsplitter. The total intensity transmitted through the sample is measured by the avalanche photodiode (Thorlabs APD410A2) and forms measurement vector y. The camera and the galvo mirrors are triggered by a data acquisition board (NI-PCI 6353) and controlled by a custom software.

Simulation of speckle-based imaging
As a first step, we perform simulations to evaluate the performance of the reconstruction algorithms for different number of measurements, spatial frequencies and levels of noise. We generate a random measurement matrix A by creating m random well-developed speckle fields (of size n = 28 × 28 pixels) which are sampled from a circular Gaussian distribution [26]. The intensity distributions of these fields form the rows of our simulated measurement matrix. To study the diffraction limit effects, we apply a low-pass filter with a cutoff frequency ν to the random field distribution, where ν is normalized to the maximum spatial frequency in the image. As ν decreases the speckles get larger, hence less spatial information is obtained from the measurement.
As a sample, we use handwritten digits from the standard MNIST database [27], which contains 70000 images (60000 in the training dataset and 10000 in the testing dataset) with an image size of 28 × 28, hence n = 784. The average sparsity of the complete dataset is 0.19.
In the noiseless case, the measurement signal from the simulated bucket detector is simply given by y = Ax, where x is the flattened ground truth image of a handwritten digit. Different levels of noise (5%, 10% and 20%) are added to both the measurement matrix and the measured signal. The noise follows a Gaussian distribution with a mean of zero and the standard deviation that equals the standard deviation of the corresponding measurement matrix or measured signal.

GAN training
For our deep convolutional Generative Adversarial Network [28], we modified the design and the code from TensorFlow [29]. The Keras Sequential API is used to define both the generator and the discriminator architectures. The input latent representation z is a (100 × 1) vector whose entities follow the standard normal distribution. To transform z to the image (28 × 28), the generator uses the transposed convolution (deconvolution) layer several times, with the LeakyReLU activation for each layer (output layer uses hyperbolic tangent activation function (tanh)). The discriminator is a convolutional neural network based image classifier, which gives positive values for real images and negative values for fake images. Fig. 2(a) shows the detailed training process of the deep convolutional GAN. The training dataset, which contains 60,000 handwritten digits, is shuffled and separated into smaller batches with the size of 256 to reduce computational complexity. The generator generates 256 fake images based on z. Discriminator maps these 256 fake images and 256 real images to either positive (real output) or negative values (fake output). Fake output is used to calculate the loss function for the generator, while both the fake output and the real output are used to calculate the loss function of the discriminator. The generator and discriminator are updated by gradient descent with the calculated loss functions, implemented by the Adam optimizer [30] with the learning rate of 0.0001. The process of training for the whole training set is repeated 1000 times. Fig. 2(b) shows the detailed reconstruction algorithm. The random input vector z (100 × 1) goes through the generator, which generates the corresponding image G(z) (28 × 28). In order to implement the gradient descent method, we use GradientTape to do the automatic differentiation and use Adam optimizer [30] with the learning rate of 0.1 to update the input latent representation z. We use 2000 iteration steps in the gradient descent optimization and repeat the procedure of optimization 10 times with new randomly generated starting point z. We select the final reconstruction result G(ẑ) out of 10 by choosing theẑ with the lowest values of loss function L(ẑ).

Basis Pursuit
Basis pursuit algorithm (BP) can solve the compressive sensing problem by min x ||x|| 1 s.t. y = Ax, where ||x|| 1 = i |x i | is the l 1 norm of vector x or the sum of the absolute values of vector entities.
In the case where noise is present, the BP algorithm is substituted by the BP denoising (BPDN) algorithm: min x ||x|| 1 s.t. ||y − Ax|| 2 2 ≤ δ, where δ is the error tolerance factor which is dependent on the amount and nature of noise in the measurement system, the number of measurements m and the number of non-zero elements (sparsity) of vector x. The ||.|| 2 is the euclidean norm. The BP denoising becomes BP when δ=0.
We implement the BP algorithm in this study with the spgl1 package [31,32].