Label-free far-field subwavelength acoustic imaging by deep learning

Seeing and recognizing an object whose size is much smaller than the illumination wavelength is a challenging task for an observer placed in the far field, due to the diffraction limit. Recent advances in near and far field microscopy have offered several ways to overcome this limitation; however, they often use invasive markers and require intricate equipment with complicated image post-processing. On the other hand, a simple marker-free solution for high-resolution imaging may be found by exploiting resonant metamaterial lenses that can convert the subwavelength image information contained in the near-field of the object to propagating field components that can reach the far field. Unfortunately, resonant metalenses are inevitably sensitive to absorption losses, which has so far largely hindered their practical applications. Here, we solve this vexing problem and show that this limitation can be turned into an advantage when metalenses are combined with deep learning techniques. We demonstrate that combining deep learning with lossy metalenses allows recognizing and imaging largely subwavelength features directly from the far field. Our acoustic learning experiment shows that, despite being thirty times smaller than the wavelength of sound, the fine details of images can be successfully reconstructed and recognized in the far field, which is crucially enabled by the presence of absorption. We envision applications in acoustic image analysis, feature detection, object classification, or as a novel noninvasive acoustic sensing tool in biomedical applications.


INTRODUCTION
The performance of microscopy applications is usually hindered by a fundamental rule that is difficult to break -the diffraction limit [1]. According to this principle, the ultimate far-field resolution of direct wave imaging devices is intrinsically constrained by the wavelength of operation. In past decades, tremendous advances have been made in both near field [2,3] and far field [4,5] microscopy to bend this rule and allow for imaging with subwavelength resolution, either by using near field-sensors or by introducing blinking markers and taking multiple far-field images. Such techniques are often associated with sophisticated and expensive optical setups, either relying on a time-consuming near field scan or on taking multiple far-field images of samples labeled with fluorescent molecules, typically followed by extensive image post-processing [6,7]. Besides, many biomedical applications require label-free solutions (without using any fluorescent tagging) to perform remote, nondestructive, and noninvasive investigations of objects.
The origins of this limit stem from the fact that the evanescent waves scattered from the subwavelength details of an object cannot propagate to the far field, unlike the larger image features, which inevitably limits the resolution of conventional imaging techniques. Thus, to be able to recover the sub-diffraction details of the image, one needs to recover the evanescent field components, for instance, by working directly in the near field and exploiting negative refraction [8]. Recently, exciting alternative approaches have been proposed to convert the evanescent field components into propagating waves by using metamaterials [8][9][10][11][12][13]. For instance, hyperbolic dispersion can be used to gradually convert evanescent components into a wave that can propagate in the surrounding medium and reach the far field [9,14,15]. Another interesting label-free approach to beat the diffraction limit combines a locallyresonant metamaterial lens (metalens) and time-reversal techniques [11,16]. The use of superoscillations [17][18][19][20] is an alternative route based on tailoring the interference of several coherent sources to focus the probe field directly into a subwavelength spot. However, these label-free approaches face difficult challenges: while metalenses are prone to losses due to their resonant nature, superoscillations are surrounded by large side-bands and typically lead to very low signal-over-noise ratios [21].
More recently, advances in machine learning provided scientists from different research fields with a unique tool to solve complex problems -deep learning [22,23]. A deep neural network (DNN) composed of multiple processing layers with non-linear modules is capable of discovering and learning the intricate structure hidden in complex data by self-adjusting the internal parameters of each of its layers. By composing a sufficient number of such layers, DNNs can learn very complex functions without human intervention, allowing for many applications in different domains of science, such as engineering, biology, medicine, quantum physics, etc. [24][25][26][27]. Recent examples of deep learning successes include medical image analysis [28], speech recognition [29], image classification [25], inverse imaging problems [30][31][32], and all kinds of complex analytical problems [33,34]. Moreover, in search of more efficient schemes of deep learning, several hybrid schemes were proposed that integrate the physical layers into DNN [35][36][37][38]. Inspired by such tremendous success, several deep learning approaches for microscopy applications were proposed [32,39,40], where, however, DNNs are mainly used to enhance the quality of images that are obtained with traditional methods [32,40,41], exploiting, for instance, generative adversarial networks [42,43].
In this work, we propose a combination of the modern deep learning techniques and metamaterial approaches to solve the limitations mentioned above of non-invasive subwavelength imaging and open a new path for novel applications in label-free imaging technologies. Remarkably, we show that, in stark contrast with conventional methods, the presence of absorption losses in the metalens is crucial to enable efficient learning. By putting a purposely lossy locally-resonant metalens in the proximity of subwavelength input images and training the DNNs to reconstruct and classify them directly, we can recover details as small as /30 and reach a far-field experimental classification accuracy of ≈ 80%.
This strategy, which is here experimentally demonstrated with an airborne audible sound, may be translated to electromagnetic waves [44].

A.
Concept of far field subwavelength imaging. The scheme for far-field subwavelength acoustic imaging studied in this work is illustrated in Fig. 1. We first consider a subwavelength acoustic source, shape like the digit "five." In this reference case, the signals captured by a microphone array placed in the far-field do not contain any information about the subwavelength details of the source, due to the diffraction limit. In other words, regardless of the signal processing strategy used, it is not possible to image the source. Less complicated tasks, for instance, guessing digits drawn by the source with reasonable probability (a standard classification problem), are also not possible. Instead [ Fig. 1(b)], to allow for the information about the subwavelength details to reach the far-field, we insert a lossy locally resonant metalens [13,45] composed of a cluster of subwavelength Helmholtz resonators, whose resonant modes can couple to the evanescent waves, and radiate into the far field [11]. Then, the amplitude and phase of the far field sampled by our microphone array are fed into DNNs, which is trained on many handwritten digits to be able to reconstruct and classify subwavelength images [ Fig. 1(c)]. In our work, two different types of DNNs are used: a "U-net" type CNN (UCNN) [46] for the image reconstruction and a multilayer parallel CNN (PCNN) for the image classification. For demonstrative purposes, the input images are taken from the MNIST database of handwritten digits (70000 images with 20 × 20 pixels resolution) and downsampled to 8 × 8 pixels, as shown in Fig. 2(a). The spatial frequency domain information reveals that the square source images, with an overall dimension of ≈ 0.1 0 , contain features with spatial frequencies , up to 30/ 0 , i.e. subwavelength details of size down to 0 /30, where 0 is the wavelength in the surrounding medium at the operating frequency 0 . Our goal is to recover these high spatial frequencies using deep learning. To demonstrate this possibility in the most general way and underline the key physical ingredients for an efficient learning process, we start with a semi-analytical 2D model of the problem based on a coupled dipole method [47], which allows us to generate the necessary data for training on subwavelength image recognition.

A.
Semi-analytical 2D data generation based on the coupled dipole method. In this model, the subwavelength images are drawn by driven two-dimensional dipoles in an eight by eight square array of total size ≈ 0.1 0 and pitch ≈ 0.01 0 [see Fig. 2 The response function of the dipoles is described by Lorentzian polarizability with resonance frequency fr, which are coupled to each other by the 2D Green's function (see Appendix for the details of the coupled dipole model). We enforce monopolar radiation of the array by choosing the resonance frequency such that ≫ 0 , and first consider the case of the digit alone, in the absence of a metalens. To capture the spatial diversity of the source, we probe the field in four points separated by a 40° angle, placed either in the near (0.25 0 ) or far (300 0 ) field region, and at four equidistant frequencies in the range between 0.8 0 − 1.1 0 , thus, generating a database with 70000 samples of measured amplitudes and phases of the field (we use 60000 training samples and 10000 test samples). The results for image reconstruction and classification using the trained DNNs are summarized in Fig. 2(b). In both near and far fields, the output images [second column inset of Fig. 2(b)] hardly represent the digit "five," translating the fact that the UCNN struggles to reconstruct the test samples. This fact is confirmed in the spatial frequency domain, where UCNN reconstructs relatively well the low spatial frequencies, while the high spatial frequencies are incorrectly guessed as being much smaller, explaining the blurred shapes of the digits. Remarkably, unlike the case of image reconstruction, the classification performance of the PNN is relatively high [the last column of Fig. 2(b)], with 67.5% and 57.5% for the near and far field regions, respectively. Such accuracy is significantly higher than random guessing (10% accuracy). The relevant information about the nature of the digit that allows for partial classification is already encoded in the lowest spatial frequency components of the image, for instance in the location of its edges (consistent with the robust classification of specific digits, for example "0" and "1" shown in supplementary Fig. S3 [48]) or of its center of gravity. Therefore, with some analogy to the stimulated emission depletion technique [5], the DNN can also be used to resolve two very close sources, which, in principle, will be only limited by the accuracy of measuring the phase wavefronts.
To further improve both the reconstruction quality and the classification accuracy, we now add a metalens, in the form of randomly distributed undriven dipoles surrounding the source [see left inset of Fig. 2(c)]. Unlike the dipoles used to draw the source digits, which are driven and operated well below their resonance frequency , the dipoles composing the metalens are undriven but resonant at 0 , allowing for strong multiple scattering in the lens. From this figure, one can observe that the source digit excites modes in the resonant metalens, turning the monopole source digit into a multipolar source with more open radiation channels. Therefore, each symbol encodes information about its subwavelength features into the way it excites different modes in the metalens, and this information is subsequently radiated into the far field. We, therefore, expect that the reconstruction quality and classification accuracy will increase with the number of modes that can be excited in the metalens at a given frequency. This fact means that, surprisingly, the introduction of losses can improve the learning efficiency by broadening the width of the metalens resonances, increasing the number of modes (see Fig. S2 in Supplemental Material [48]) that can be accessed at a given frequency (effective mode density). Such unusual dependence is in stark contrast with conventional imaging methods based on metalenses, which are typically hindered by losses.
To validate this intuition, we start by using a lossless metalens with a relatively low mode density, with only = 29 dipoles. We see in the top row of Fig. 2(c) that, while the classification accuracy of the PCNN increased from 57.5% to 74%, the image reconstruction of the digits are still of low quality (see Fig. S3 in Supplemental Material [48] for digits other than "five"). The incapability of the UCNN to accurately recover the high spatial frequency components of the subwavelength images is due to the insufficient density of states supported by the metalens, resulting in its inability to encode enough information about the subwavelength details of the image. This density of states can be improved either by increasing the number of resonators that form the metalens or by introducing absorption losses to the resonators (see Supplemental Material [48] for quantitative calculations of the density of states for a different number of resonators and amount of added losses). In Fig. 2(c) (bottom row), we report the results obtained with N = 302 lossy resonators (with a collision frequency ≈ 0 /2). We see that not only the overall classification accuracy raised to 84%, but also that the reconstruction of the subwavelength images is now very accurate. Remarkably, while in most metalens imaging scenarios, the presence of losses inevitably degrades the imaging performance of these resonant systems, here the effect is opposite since the neural network can learn from the larger amount of information hidden in an increased number of lossy modes.
Such difference may constitute a key advantage for learning methods using metalenses in the development of future applications, as losses are an integral part of any realistic wave device. The next session provides an experimental demonstration of these findings using airborne audible sound. The learning data is generated by drawing the source digit on the loudspeaker array and measuring the complex acoustic pressure at four locations and four arbitrarily chosen frequencies between = 220 Hz and = 260 Hz ( = 1559 mm and = 1319 mm respectively). Therefore, the overall size of the acoustic source is around 0.2 , and the digits contain features down to 30 times below the diffraction limit. Our experimental results for the subwavelength imaging in the presence and absence of metalens are summarized in Fig. 4. First, we tested the possibility of image reconstruction in the near field (≈ 0.1 ) without metalens, which, as expected in this 3D scenario, provides a proper restoration of sub-diffraction details of subwavelength images, as shown in Fig. 4(a). The trained UCNN reconstructs most of the test digits with excellent visual fidelity, although somehow blurring small details (due to a partial loss of spatial frequencies , > 15/ , see Fig. S4), resulting in good classification accuracy (86.5%). In the far field region and the absence of metalens, the UCNN is no longer capable of resolving the details < 90 mm, resulting in the reliable reconstruction of only a few digits, such as "0" and "1" (see

C.
Transfer learning for subwavelength imaging. In the previous section, we demonstrated that our DNNs could restore the initial subwavelength image from the recorded amplitude-phase distributions in the far field. Here, we go one step further and demonstrate its ability to re-learn quickly on a new database, which can be much smaller than the original one. Such flexibility in the learning process is also known as transfer learning: we create a new database consisting of 600 training and 200 test samples (≈ 1% of initial MNIST database) of four letters "E", "F", "L", "P" and retrain our UCNN (previously trained on the MNIST database) on this new, significantly smaller dataset. Then, we ask the neural network to classify and reconstruct unknown letters drawn in a test dataset. The experimentally reconstructed letters are shown in Fig. 5 (see Fig.S6 for more examples).
The excellent visual fidelity (with correlation coefficients ≥ 0.94 between the input and reconstructed letters) demonstrates the high adaptability of the DNN approach, which becomes more efficient at learning new data types, without being limited by the diversity of the input databases.

III. Discussion
We have experimentally demonstrated that a combination of a resonant metalens and DNNs enables the reconstruction and recognition of subwavelength acoustic images from the far field with an accuracy of ≈ 80%, which is allowed by the presence of substantial absorption losses in the metalens. This label-free method allows for beating the diffraction limit and reconstructs source images containing details smaller than /30, which are crucial for accurate classification of these images, whose total size is 0.2 . We believe that even smaller acoustic objects can be successfully recognized by working with acoustic resonators with lower resonance frequencies, such as ones based on membranes, which can form the basis of a new form of acoustic metasurfaces dedicated to learning systems capable of non-invasive, labelfree subwavelength imaging. A potentially interesting future direction may be to explore whether a similar approach can be used to detect the presence, position, or shapes of small particles in multiple-scattering media, with potential impact in bioengineering applications.

B. Coupled dipole method.
To demonstrate the principle of far field subwavelength imaging, first, we perform a numerical analysis using a semi-analytical model based on two-dimensional coupled dipoles [49,50]. Such a 2D model contains all the essential physical ingredients to simulate wave propagation and scattering in locally resonant media. In the model, each dipole is modeled by its polarizability , which follows a Lorentzian model consistent with the optical theorem, namely −1 = 2 − 2 + � (2) � �� �⃗ − � �⃗�� and can be locally excited using an external source field : In order to simulate an image source, each pixel of the image is modeled as a dipole, which is excited with an external source field ≠ 0 with an amplitude proportional to the intensity of such pixel.

D. Extracting S-parameters.
To characterize the resonant properties of Helmholtz resonators, we performed a simple two-port scattering experiment. We placed a Helmholtz resonator inside a tube supporting a single plane wave-like propagating mode, forming a 2-port scattering network. In this picture, each port represents the wave propagating towards (away from) the resonator on both sides of the waveguide [51]. Thus, we can define the relation between the propagating waves using the S-matrix: where ( ) are the complex amplitudes of the waves propagating towards (away from) the resonator at ports =1,2. The amplitudes ( ) are extracted from complex pressure measurements performed at four different points along the tube (two on each side of the resonator). By performing two independent measurements, exciting the same structure from both sides, we get enough information to find all four S-matrix coefficients.
FIG. 1. Deep neural network approach for far-field recognition of subwavelength images. (a) A subwavelength source radiates into an infinite number of plane waves in all directions. The waves with high amplitude wavevector components that contain information about the subwavelength features are concentrated in the near field of the object due to their exponential decay, therefore resulting in the loss of subwavelength features in the far field. (b) A metamaterial lens inserted in the near field of the object can couple to the evanescent field components and re-radiate the waves with the information encoded into the far field patterns. (c) The UCNN (U-net convolutional neural networks) learns the correlation between the far field amplitude and phase patterns and the subwavelength images. This DNN is composed of several blocks containing a convolutional encoding front end and deconvolutional decoding back end with skip connections (see Appendix for additional details) and followed by a PCNN that classifies reconstructed images into ten categories of handwritten digits (0-9).