Efficient random number generation techniques for CMOS SPAD array based devices

This work presents new techniques to produce true random bits by exploiting single photon time of arrival. Two FPGA-based QRNG devices are presented: Randy which uses one discrete SPAD and LinoSPAD which uses a CMOS SPAD array, along with a time-to-digital converter (TDC). Post-processing procedures are explained in order to extract randomness taking care of SPAD and TDC non-idealities. These procedures are based on the application of Peres [Y. Peres, Ann. Statist.20, 590 (1992)] and Zhou-Bruk [H. Zhou and J. Bruck, arXiv:1209.0726 (2012)] extraction algorithms. Achieved generation rates are 1.8 Mbit/s for Randy device and 310 Mbit/s for LinoSPAD device. Randy QRNG also features a real time procedure which was used for the realization of fundamental tests of physics.


I. INTRODUCTION
In very recent years, there has been a widespread interest in random number generators based on physical processes of quantum nature. In fact, these devices, the so-called quantum random number generators (QRNGs), represent the ultimate way to obtain reliable randomness, free from the typical non-random issue that affects algorithmic random number generators. Typically, QRNGs exploit the quantum properties of the optical radiation field in many "recipes". The different architectures exploit different properties; setups using entangled systems and violating Bell inequalities feature the highest unpredictability in the so-called device-independent (DI) framework [3][4][5][6][7]. Systems that trust the measurement apparatus but not the states of the quantum system or vice versa, the so-called semi-device-independent generators, work under less strict assumptions [8]: for this reason, they feature, in principle, less unpredictability than DI-QRNG but are more feasible to implement and reach even larger generation rates. The last category includes those generators that work in a framework of complete trust in both the quantum system and the measurement apparatus. This means that the absence of side information, exploitable by an adversary, is assumed. The most famous example of this kind of generator is the welcher weg QRNG that produces randomness according to which path a photon takes after interacting with a beam-splitter [9,10]. In this work we consider the subcategory of trusted QRNG that were developed in order to limit the number of single photon detectors. Random number generation with just one detector is indeed possible by exploiting time as an additional degree of freedom and by leveraging on the statistical features of the photon detection distribution. This could be done with the following procedure: time sampling of the photon detector with a sampling rate almost equal to the photon rate; application of dedicated generation protocols; application of dedicated unbiasing algorithms. In this work we will show that it is possible to generate true random numbers without the application of dedicated generation protocols. Using a higher sampling rate we applied unbiasing algorithms directly to the samples stream achieving higher generation rate of true random numbers. We used this technique on two different systems: the first one, Randy, uses a standard clock to sample the signal of one single photon detector; the second one, LinoSPAD, uses both a standard clock and a time-to-digital converter (TDC) to sample the signals from a matrix of 256 single photon detectors.
The paper is structured as follows: in Sec. II the most common ways to generate random numbers from time will be reviewed. In Sec. III we will introduce our method and we will compare it with the previous ones. The results achieved by applying it to the numbers obtained with a single detector connected to an FPGA will be presented. In Sec. V we will extend this method to a matrix of 256 detectors with time tagging capabilities. In Sec. VI the conclusions will be presented.

II. EXISTING PARADIGMS OF SINGLE DETECTOR QRNGS
We now introduce two elaborated generation protocols, reported in literature, that can extract randomness from a photons distribution.
As a first example, let us consider the generation protocol introduced by Stipčević et al. in [11], denominated here "Diff-QRNG". The Diff-QRNG comprises: a light source attenuated to single photon level illuminating a single photon detector; a clock that counts the time between the detection events. A binary random variable X j = {0, 1} is obtained by comparing the length of time intervals between three consecutive detections. It is therefore convenient to define the discrete random arXiv:1910.05232v1 [quant-ph] 11 Oct 2019 variable T j , associated to the detection instants, such that T j = t j 0 , t j 1 , t j 2 . Hence, X j takes its value x j according to the following "rule": given ∆T 1 = t 1 − t 0 and ∆T 2 = t 2 − t 1 , The rule is iterated so that for the next bit b j+1 the new time interval starts with the end of the previous one, i.e., t j 2 = t j+1 0 . The physical principle that guarantees the identical (Pr [x j = 0] = Pr [x j = 1]) and independent (Pr [x j |x j−1 ] = Pr [x j |x j−1 , x j−2 , . . . , x 1 ]= 1 /2) distribution of the bits follows from the memoryless property that characterizes the exponential distribution of the interarrival times ∆T , namely Pr [∆T 1 > ∆T 2 ] = Pr [∆T 2 > ∆T 1 ] = 1 /2. As a consequence, a random string X = {X 1 , X 2 , . . . , X n } with n → ∞, is characterized by full Shannon entropy, i.e. H(X) = 1 bit. This implies that the average number of i.i.d. bits that are generated per unit time, is equal to r i.i.d. = r phot /2, where r phot is the number of photo-detections per unit time [12].
As a second example, let us consider the generation protocol introduced by Fürst et al. in [13], denominated here "OdEven-QRNG". As in the previous example, in the OdEven-QRNG a light source attenuated to single photon level illuminates a photomultiplier but in this case a counter enumerates the number of photons detected within a fixed time interval, τ , corresponding to the period of a sampling signal. Defining n j τ as the number of detections within the interval τ j , a random binary variable X j assumes its value x j , with the following rule: • if n j τ mod 2 = 0 then x j = 0 • if n j τ mod 2 = 1 then x j = 1, in other words, the bit value is determined according whether an even or odd number of detections is registered in the time interval τ . For a Poisson distribution we can write the probability of having an even or odd number of detections: where λ is the mean number of photons per second and λτ = n τ is the mean number of photons per time interval τ . To avoid a bias in the output, n τ has to be sufficiently large so that Pr[x j = 0] Pr[x j = 1]. Therefore, since H(X) is a function of n τ , the generation rate is given by R = H(X)/τ .

III. THE NOVEL APPROACH
Our contribution takes into consideration the simplest and most efficient way to generate random numbers by using the temporal degree of freedom without the need to devise complex "rules". In its essence, the process of random number generation with a single-photon detector, for instance a single-photon avalanche diode (SPAD), can be considered as a process with two signals: a squaredwave signal S det (t) generated by the single-photon detector that is sampled by a periodic signal S clk (t), which is generated by a clock with a period τ clk . Without impinging photons S det (t) has a typical value L. Given a photo-detection at the time instant t i , the S det (t) toggles its state from S det (t)(t < t i ) = L to S det (t)(t i ≤ t < τ U ) = U . τ U is the specific fixed time interval in which a SPAD keeps the state U before returning to L and typically τ U < τ dead , being τ dead the SPAD dead-time. This means that S det (t) toggles back to the L state before the SPAD being able to detect another photon. The binary random variable X j takes its value x j at the instant jτ clk according to the following rule: with the index j ∈ {0, 1, 2, . . . }. The above rule is equivalent to having x j = 1 whenever the clock detects a rising edge of S det (t) and x j = 0 otherwise. It is worth to be noticed that all existing paradigms of QRNG based on photon time of arrival can be seen as a (non-optimal) post-processing algorithm of the sequence X = (x 1 , x 2 , · · · , x n , · · · ). We note that the maximum content of randomness that can be extracted by the above physical process is fully included in the X sequence, and in particular it is given by the Shannon entropy (in the large n limits): where p 0 and p 1 are the probability of obtaining X = 0 and X = 1 respectively. It is clear that the maximum generation rate is given by r max = τ −1 clk . Given a mean photon number per second λ = r phot , for a Poisson distribution the probability of no detections within the τ clk interval is equal to Pr[∅] = e −r phot τ clk . To achieve the rate r max it is necessary to avoid a bias in the output string X, by tuning the photon detection rate in order to obtain at least one detection within a sampling period, namely 1 − Pr[∅] = 1/2 from which we obtain r phot = ln 2/τ clk . It is therefore clear that the faster the clock rate is, the larger the detection rate should be to keep low the bias in 0. However, r phot cannot be set arbitrarily large as it is strongly limited by the detector dead time. Given a fixed r phot , our idea is to increase the generation rate r by deliberately increasing the sampling rate and producing a highly biased string with plenty of 0's and very few 1's. Then, we re-balance the bias through an optimal post-processing algorithm, introduced by Peres [1]. The Peres algorithm is a revision of the famous von Neumann algorithm [14] and allows to make a biased string uniform while distilling the maximum entropy from that. We give a brief description of it. Given a sequence of samples s w , the Peres procedure is defined as where Ψ N (s) is the von Neumann algorithm applied to the string, Ψ(Ψ U (s)) is the Peres algorithm applied to a new sub-string Ψ U (s) and Ψ(Ψ V (s)) is the Peres algorithm applied to another new sub-string Ψ V (s). Ψ U (s) is created by an XOR operation on samples pair while Ψ V (s) is created by taking the latest samples of every 00 and 11 pairs. For a full description of the procedure refer to [1]. Clearly, the i.i.d. property comes from the Poisson distribution itself.
The extraction rate after Peres algorithm, under asymptotic assumptions, is given by As shown in Fig. 2, by fixing r phot , the extraction rate increases with the sampling rate despite the value of H(X) (the maximum entropy is reached with a sampling rate equal to r phot /ln2 = 288.539 kHz). These plots confirm the following: with a fixed r phot , the generation rate is more influenced by the actual input length than by the bias value. Therefore, the optimal choice is to have To summarize, any QRNG based on the photon time of arrival can be modelled by a binary signal S det (t) sampled by a clock S clk (t) that gives an output sequence X. Standard generation protocols, such as Diff-QRNG or OdEven-QRNG, can be applied to X. Nevertheless, these protocols are far from be efficient. We here propose an optimal post-processing able to extract the maximum available entropy from the string X based on the Peres algorithm. In the next section we will show how to apply our method to physical generators, while taking care of the non-idealities of the detectors.

IV. SINGLE DETECTOR IMPLEMENTATION
We first implemented our method by using a single SPAD shined by an attenuated laser light. We designed a system that is capable to produce true random numbers using different generation protocols as well as Peres algorithm. The system was developed from an FPGA/CPU device [15]. The FPGA allows us a full control over the generation process and a great flexibility in order to easily switch from one protocol to another. We called this design RANDY. A schematic view of the setup is shown in Fig. 1. We set the light source intensity to keep the photon count rate around 200 kcounts/s, due to the SPAD non-linear behaviour on higher rates. As a first implementation we used the default 100 MHz system clock to sample the SPAD signals. The advantages are quite clear since an FPGA allows a full description of the time evolu-tion of the system: every operation can be described as a multiple of the fundamental time unit τ clk defined by the system clock. Therefore, the behaviour of the system is fully deterministic apart from the non-deterministic side due to the true randomness of the photon time of arrival.
The FPGA saves a logical 0 for every clock cycle whenever the SPAD signal is low, a logical 1 otherwise. The raw string is then post-processed on a dedicated CPU. Due to the huge difference between the clock rate and the photon rate (100 MHz over 200 kcount/s), the raw strings are heavily biased in zero with a percentage of 99.8%. Ideally, the role of Peres algorithm is to reduce the bias to zero. Nevertheless, Peres algorithm cannot work around any correlation. On the contrary, it could emphasize it. Therefore, any correlation has to be removed before the application of Peres algorithm. In our case, correlation comes from afterpulses and dead time of the detectors and we describe how to remove them in the following.
We evaluated the distribution of the time interval between consecutive events. For ideal Poissonian events, this distribution is expected to be a decreasing exponential. As shown in Fig. 3, the experimental distribution differs from the ideal distribution by two effects. The first is the dead time, whose value depends on the specific SPAD used [16] and represents the minimum time distance between two rising edge of S det (t). τ dead 30 ns corresponding to 3 clock cycles.

Time Differences between consecutive events
The second effect is the afterpulse. Afterpulses are spurious events that occur randomly within a fixed time interval from a real detection. As a result, they produce a peak which undermines the exponential trend of the events difference distribution. We evaluated a 18 clock cycles (180 ns) cross-point between the only-true-events region and the afterpulses-region.
Dead-time and afterpulses introduce correlations in the output string. Clearly, after a value x j = 1, some of the few following bits in X are not independent. Since we estimated 18 clock cycles as the required time to be in the true-event region, to remove the correlation we may simply remove 18 values of X following any value x j = 1.
As a result, this procedure eliminated a portion of valid random events which we evaluate to be around 14%. However, the resulting distribution of the difference between consecutive 1's follows the expected decreasing exponential (see Fig. 3 bottom). Moreover, the process eliminates the correlation initially present in the string X, as demonstrated by Fig. 4.
Once dead time and afterpulse are compensated, we apply Peres algorithm to the bit string. With an actual photon count rate of 172 kcounts/s and an equivalent sampling frequency of 97 MHz (due to afterpulses and dead time removal), a real time implementation of the Peres algorithm would have yielded a final true random bit rate of 1.8 Mbit/s according to the rates shown in Fig.  2. As shown in Fig. 4 the correlation over the output string is within the limits of statistical acceptance.
Moreover, the two generation protocols Diff-QRNG and OdEven-QRNG described in [11,13] were also implemented on the same FPGA-system in order to compare their performances with our method. The high time resolution (r phot τ −1 clk ) allows enough precision to identify time differences and to have a good estimation on the detected photons within a fixed intervals. Indeed, the system was successfully used to produce timed random numbers in the work of Vedovato et al. [17]. Of course, the only uncertainty came from the photons time of arrival but it was handled by setting the photon counting rate to a proper value [18]. On the other hand, these protocols have low rate performances: a photon count rate of 200 kcount/s produce a rate of true random bits of 100 kbit/s for the Diff-QRNG protocol and of 20 kbit/s for the OdEven-QRNG [19].

V. MULTIPLEXING THE GENERATION RATE
The main limitation concerning the use of QRNG based on SPADs, i.e. discrete variable QRNG, is the limited generation rate achievable. Typically, SPADs feature maximum count rates of a few Mcps (counts per second). QRNGs based on continuous variable protocols are therefore preferred when it comes to obtain rates in the order of Gbps [20,21]. However, the recent advancements in miniaturization techniques, and especially the creation of deep-submicron CMOS SPADs, have led to arrays and matrices with hundreds or even thousands of SPADs [22,23]. Hence, each SPAD can be considered as a pixel of an extremely sensitive light sensor. The application to the case of random number generation is then straightforward: given that every pixel works independently, it is possible to multiplex the random signals and then fill the rate gap with CV-QRNGs [24,25]. The typical approach is to generalize the paradigm of the welcher weg QRNG -a beam splitter and two photon paths -to a generator where photons can take N possible paths, with N being the total pixel number of the sensor [26]. Nevertheless, as in Randy, the temporal degree of freedom can be exploited as well allowing an easier calibration. Therefore, the sensor is illuminated with a uniform light intensity in order that each SPAD has the same probability to click within a given time interval, which corresponds to the exposure time for a frame. Random numbers are indeed produced by periodically sampling each pixel and applying dedicated generation protocols. In this work we consider a sensor of recent introduction, LinoSPAD [27], from the AQUA lab at Delft University & EPFL, which features 256 pixels arranged in four linear arrays of 64 pixels each. The peculiarity of LinoSPAD is a time tagging functionality, which associates a temporal coordinate to every detection, thus potentially enabling a further increase in the generation rate. In the following, we will apply the techniques described in the previous Sections for the single detector case to LinoSPAD.
LinoSPAD features 64 FPGA-based time-to-digital converters (TDCs) [28,29] that tag the detections of each SPAD in a given bank. Each TDC is implemented by a delay line with 35 carry elements of 4 bits and it is sampled with a frequency f clock = 400 MHz. Every time interval τ clock = 2.5 ns the TDC emits an output code b ∈ {0, 139}. The TDC has therefore a sub-resolution of τ sub = τ clock /140 17.86 ps and this represents the fundamental time resolution of the system. The following considerations take into account that the actual number of valid pixels is 64 and not 256 due to the limitations to 64 TDCs. The measurements of the photon time of arrival are taken with respect to a "reference" time signal, whose period determines the integration time of a frame. The buffer of the system can add output codes to a maximum of 2 28 bins. Hence, the longest measurable time interval between the reference clock signal and a photon detection cannot be larger than τ sub · 2 28 4.8 ms.
Since the device registers a maximum of 512 tags per pixel during the integration time, every frame is composed at most by 64 x 512 tags. Similarly to the paradigm adopted in the previous Sections, random bits can be extracted directly from the bare physics of the process: a string τ frame /τ sub bits long is associated to each frame and the 1's, equal in number to the number of tags, are located according to the tag values. Again, the limit of this approach is that the strings are consistently biased towards zero. This bias is the result of two concurring causes: the first one is the dead time of the SPADs, which being of τ dead 40 ns, implies that every bit 1 is necessarily followed by τ dead /τ sub 2240 0's. However, the 0's due to the dead time can be removed, as it was previously done. The second cause is the limited buffer size: each string produced in a frame will always feature at most 512 1's. Indeed, an extraction approach by means of the Peres debiasing procedure could be suitable even for this framework. As in Randy, we studied the interarrival detections time, whose distribution is reported in Fig. 5, in order to detect the artefacts induced by the physical limits of the device.
The histogram starts approximately at 40 ns due to the dead time and a peak starting at 40 ns and extending up to 200 ns indicates the presence of afterpulses. A noticeable feature is an unusual peaks pattern. This pattern was due to a non-linear behaviour of the TDC (further details in subsection V B). Hence, to manage these non-idealities we decided to separate the tag resolution between coarse resolution (Clock sampling) and fine resolution (TDC) implementing two different postprocessing procedures. This distinction allows an easy post-processing procedure since it separates the treatment of non-idealities, i.e. coarse for detectors ones and fine for electronics ones (see subsection V A and V B). The analysis was done on data collected with an acquisition of 8·10 3 frames where every frame has an integration time τ frame = 320 · 10 −6 s. The photon rate was tuned to obtain approximately 400 counts per frame. Given a buffer size of 512 detections, this value was chosen in order to keep low the probability of saturation and frame loosing. Results show a final achieved generation bit rate equal to 310 Mbit/s.

A. COARSE resolution
The coarse resolution is defined by the 400 MHz system clock. The tag information is related to the number of clock cycles in which an event is detected. Therefore, it describes the temporal distance between an event and the zero reference with a resolution of 2.5 ns. We remove dead time and afterpulses of the SPADs with the same technique described in section III. However, the SPAD arrays suffer from pixel cross-correlation which causes a pixel to output an event when its neighbour receives a photon, i.e. it produces a fake event as in the afterpulse. In order to remove it and to be sure that no cross correlation exists, we discard approximately one third of the pixels of a bank losing all the events from those pixels. This procedure reduces the actual number of pixel from 64 to 22. Once the cross correlation is removed as well as the dead time and the afterpulse, we apply Peres algorithm to every selected pixel independently. As discussed in section III and according to Fig. 2, higher rates can be achieved by preferring longer heavily biased bit strings over shorter slightly biased bit strings. Therefore, we treat every selected pixel as an autonomous QRNG. As in the case of single detector implementation, we evaluated the serial correlation on a sampled bit string and it re-enters within the limit of acceptance after removing dead time, afterpulse and pixel cross-correlation. By summing the bit rate of the selected pixels we obtain a total bit rate of R coarse 87 Mbit/s. The value is av-eraged since the event rate varies from pixel to pixel as shown in Fig. 6. The mean extraction rate per pixel is equal to R coarse /64 = R coarse,p 1.36 Mbit/s. Considering an ideal SPAD array with no correlation, the hypothetical extraction rate per pixel would be equal to R coarse /22 = R * coarse,p 3.95 Mbit/s.

B. FINE resolution
The fine resolution is defined by the TDC and has a time resolution of approximately 17.86 ps. The TDC outputs a number between 0 and 139 which identifies a precise moment within a clock cycle in which there was an event. Hence, the access to the TDC value is done every clock cycle. As in the coarse resolution, we decide to treat every pixel independently. The peaks of Fig. 5 are separated exactly by 2.5 ns, which can be explained by the presence of a dead time in accessing the TDC. This behaviour brings to the tag distribution shown in Fig. 7 which represents an entire 2.5 ns clock cycle. The figure shows that there are no events in the lower bins and also a significant bias on several values. This behaviour is due to the TDC implementation on an FPGA technology which introduces non-linearity caused by different propagation delays over hardware blocks [30]. It is worth to notice that, while the distribution is not uniform, its entropy is equal to H exp = − implying that there are no events from 140 to 255, which worsens the situation. Clearly, Peres algorithm cannot be applied to this string. In order to remove the bias and the correlation on the modulo 140 distribution, it is possible to use another post-processing method, the algorithm proposed by Zhou and Bruck [2]. The latter is the generalization of the Peres algorithm for biased distributions over a finite number of integers. Now we briefly describe the Zhou-Bruck algorithm (for a full description of the method refer to [2]). Let's consider a random variable X with n possible outcomes with a biased probability distribution. Let's define b = log 2 n as the number of bits required for a binary description of the outcomes (in our case with n = 140 we have b = 8). If the outcomes are labelled as 0, 1, · · · , n − 1 and converted to binary, a string of b bits corresponds to each outcome. For a given sequence (x 1 , · · · , x M ), we can convert any x k to the corresponding binary string and consider only the first bit of each string: the resulting sequence is biased but with no correlation. Let's now consider the second bit of each string. We may create two sequences corresponding to the two possible values of the first bit: the first sequence collects the second bits related to a 0 first bit and vice versa for the second one. Again, these two sequences have bias but no correlation. The idea can be iterated: the i th bits can be grouped into 2 i − 1 sequences according to the values of the i − 1 bits. After this manipulation, one gets N different sequences where N = 2 b . Since these N sequences have bias but no correlation, the Peres algorithm can be applied separately to each of them (if the sequence is not empty). We implemented the Zhou-Bruck method on the tags modulo 140 of each pixel. As a matter of fact, the Zhou-Bruck method is quite efficient: for example, for the pixel 256, from each tag we get an average of 6.3 unbiased bits: this value is close to the maximum H exp 6.8 that can be extracted with a perfect efficient algorithm. The obtained bits were evaluated in term of bias (Fig. 8) and binary entropy extraction efficiency (Fig. 9) as well as serial correlation which re-enters within the limit of acceptance.  The final rate of the fine resolution is equal to R fine = 223 Mbit/s for the whole pixels array. In order to evaluate the rate for a single pixel it must be taken in account that there are only 64 TDCs which switch among the four pixel banks. Therefore, the actual number of pixels (in therms of rate) truly is 64 and the final rate per pixel is R fine /64 = R fine,p 3.48 Mbit/s per pixel. Thus, by considering the fine and coarse resolution, the average generation rate per pixel is given by R tot,p = R fine,p + R coarse,p 4.84 Mbit/s while the total generation rate for LinoSPAD device is given by R tot = R fine + R coarse 310 Mbit/s.

VI. CONCLUSION
In this paper we showed improved techniques to produce true random numbers by processing single photon events. An innovative CMOS SPAD array device called LinoSPAD was used to implement a high rate RNG. It integrates a temporal tagging system with a detector matrix and thus it allows to use the temporal degree of freedom in addition to the much more common spatial one. Starting from a single detector system (Randy), we defined an efficient procedure to fully exploit the temporal degree of freedom. With respect to existing paradigms which are based on complex rules and are far from being information efficient, our procedure extracts the most of the system entropy, achieving the maximum bit rate allowed by the system. This procedure is based on the use of a high frequency sampling clock (with respect to the photon rate) and on the use of the Peres unbiasing algorithm. Therefore, the generation rate is only limited by the physical device performances and not by the technique itself. Applying this technique to LinoSPAD required a further step to deal with detectors matrix nonidealities (pixel cross correlation) and TDC ones. Hence, a dedicated post-processing procedure, which included the usage of the Zhou-Bruk algorithm, was developed in order to work around such non-idealities. The summary comparison bar diagram of Fig. 10 clearly shows the remarkable differences among different generation procedures. Our Peres-based procedure clearly reaches a higher bit rate with respect to a protocol-based one. Furthermore, moving from the discrete SPADs framework (Randy) to the CMOS SPAD array one and increasing the sampling frequency as well as adding a time-to-digital converter (LinoSPAD) improves the generation rate even more. Final results show a bitrate per SPAD/pixel equal to: • R Randy,spad (100 MHz) = 1.8 Mbit/s • R LinoSPAD,spad (400 MHz + TDC) = 4.84 Mbit/s and for LinoSPAD a total bitrate of R LinoSPAD = R LinoSPAD,spad × 64 = 310 Mbit/s. Moreover, applying these techniques to other physical devices with better performances will further increase the generation rate. Future steps will also consider a realtime implementation of both procedures since the preprocessing, the Peres and Zhou-Bruk algorithms could be effectively implemented via FPGA.