Adaptive phase estimation through a genetic algorithm

Quantum metrology is one of the most relevant applications of quantum information theory to quantum technologies. Here, quantum probes are exploited to overcome classical bounds in the estimation of unknown parameters. In this context, phase estimation, where the unknown parameter is a phase shift between two modes of a quantum system, is a fundamental problem. In practical and realistic applications, it is necessary to devise methods to optimally estimate an unknown phase shift by using a limited number of probes. Here we introduce and experimentally demonstrate a machine learning-based approach for the adaptive estimation of a phase shift in a Mach-Zehnder interferometer, tailored for optimal performances with limited resources. The employed technique is a genetic algorithm used to devise the optimal feedback phases employed during the estimation in an ofﬂine fashion. The results show the capability to retrieve the true value of the phase by using few photons, and to reach the sensitivity bounds in such small probe regime. We ﬁnally investigate the robustness of the protocol with respect to common experimental errors, showing that the protocol can be adapted to a noisy scenario. Such approach promises to be a useful tool for more complex and general tasks where optimization of feedback parameters is required.


I. INTRODUCTION
A large number of physical problems can be mapped in a phase estimation task, in which an unknown relative phase shift has to be measured [1][2][3]. Notable examples are the following: detection of gravitational waves [4], atomic clocks [5], measurement on biological systems [6], measurements of forces [7], lithography [8,9], imaging [10,11], spectroscopy and frequency measurements [12,13]. In this context, the fundamental bounds on the achievable sensitivity are provided by quantum mechanical laws [14][15][16][17]. More specifically, the goal of quantum metrology is to exploit quantum probes to enhance the achievable sensitivity with respect to classical strategies. One of the most important physical systems employed for phase estimation is represented by photons [17,18].
In many realistic scenarios, the number of probes that can be exploited in the estimation process is limited. Examples are provided by highly sensitive biological samples that can be damaged by high fluxes of photons [6,19,20], fragile atomic or molecular systems [21][22][23][24][25], or communication scenarios where few photons are employed [26]. In the single parameter case, theorems guarantee the possibility to reach the fundamental bounds achievable by using a given probe. However, such capability is guaranteed only in the asymptotic regime where a large number of copies of the probe state are employed [27]. In the limited data scenario [28,29], there is no standard protocol to reach the ultimate bounds, while different recipes can be adopted. Among different approaches (even protective measurements can be exploited [30]), a powerful tool to enhance the convergence of the estimation is provided by adaptive schemes [17,31]. Adaptive protocols employ additional control parameters during the estimation process, which are in general tuned according to the acquired knowledge on the system [32,33]. Different adaptive strategies have been devised and can be grouped in two general classes, namely, online and offline techniques. The former class refers to protocols that calculate the optimal feedback during the experiment at each step of the algorithm, according to some heuristic. Notable examples of protocols lying in this category are adaptive Bayesian techniques [32][33][34][35][36][37]. Conversely, offline schemes are those where the rules to tune the feedback parameters are calculated in advance before the estimation process. In particular, those protocols lying in the offline class are crucial for different practical scenarios. For instance, they are necessary when the computational power available during the estimation process is limited, or when feedback controls are used for fast processes and the time available for an online calculation is small. However, the space of all possible actions for the feedback parameters to be calculated in an offline approach can be huge, and many parameters functions optimization in such space is a computationally expensive task. In order to handle the complexity of this optimization, an effective solution is provided by machine learning.
Machine learning techniques [38,39] have been identified as a powerful tool to enhance quantum information tasks [40,41], including calibration of quantum sensors [42], state reconstruction and tomography [43][44][45][46], designing and optimizing experiments and measurements [47][48][49][50][51], even learning concepts and models [52,53], and enhancing quantum metrology tasks [37,[54][55][56][57][58][59][60][61][62][63][64]. In the context of phase estimation, different machine learning techniques have been used to calculate feedback actions in the offline approach. Here the adaptive control parameters can be realized by additional phase shifts inside the interferometers. Hence, the goal of an offline approach is to precalculate the possible feedback phases according to some heuristic. Notably, two machine learning techniques have been applied for such purpose and are particle swarm optimization (PSO) [54,55], whose effectiveness has been demonstrated in a single photon experiment [37], and differential evolution (DE) [56,57]. These are evolutionary algorithms [65,66] inspired by biological dynamics, which are able to solve optimization problems using a trial and error approach, and thus finding global maxima. The solution of such algorithms, applied in the context of phase estimation, are lists of optimal feedback phases to be employed during the process. Such lists live in a high-dimensional space and are optimal in the sense that maximize a chosen figure of merit, called fitness, related to the precision of the estimation. The choice of the fitness function is performed depending on the specific problem at hand. Another machine learning technique inspired by natural processes is reinforcement learning. This approach is based only on the acquired data, without using an explicit modeling of the problem. Much like evolutionary algorithms, reinforcement learning is a technique also able to optimize functions, but using rewards for positive actions instead of maximization of fitness function. Evolutionary algorithms can improve reinforcement learning and vice versa [67]. The ability of reinforcement learning has been demonstrated in devising quantum-error-correction strategies through feedback based measurements [68], while coherent control on qubits can be used for decisions by learning agents [69]. In general, some machine learning techniques can be more suitable than others, depending on the task. It is then of crucial importance to find and explore different approaches able to enhance phase estimation processes.
Here we theoretically introduce and experimentally demonstrate a technique based on a genetic algorithm (GA) [70]. These dynamic algorithms are inspired by natural selection and start from a set of candidate solutions which evolve in time to find high quality ones. Genetic evolutionary strategies have been applied for different quantum information tasks [47,51,71]. In our case, the aim is to find optimal feedback strategies to reach the fundamental limits in phase estimation experiments. More specifically, the algorithm uses the survival of the fittest strategy to evolve a population of candidate solutions, that is, feedback phases. The genetic operators of selection, crossover and mutation are applied, to progressively find a better set of solutions in the search space. Finally, given a halting condition, the best solution will, with high probability, be the optimal one maximizing the sensitivity of the estimation. Once the best solution is found, we employ the obtained feedback lists to experimentally perform phase estimation in a photonic platform with a sequence of single-photon states. By using such approach, we show that the performed phase estimation experiments reach the ultimate limit in the precision. In the experiment, such limit is provided by the standard quantum limit (SQL) [14,27]. Showing the performance of the estimation process for different phase shifts, we demonstrate the effectiveness of a GA approach as an offline protocol for quantum phase estimation. Furthermore, we also show by means of simulations the robustness of this approach to common sources of noise.

A. Adaptive phase estimation
Photonic phase estimation employs light probes in an interferometric scheme to estimate an unknown phase shift between two optical modes. A paradigmatic scheme for this task is a Mach-Zehnder interferometer (MZI). Here two input modes interfere in a first optical element. Then, the two modes of the MZI, after acquiring a relative phase shift φ, interfere in a second optical element. A MZI can be encoded in photon path, where photons interfere in beamsplitters as shown in Fig. 1(a). The same structure can be obtained in other degrees of freedom, such as polarization, where modes are mixed via half wave plates [ Fig. 1(b)] [17]. The goal of the process is to estimate the unknown phase shift φ by measuring the probe states after propagation through the interferometer. When the probes are composed of single photons, the phase-dependent output probabilities corresponding to the two possible measurement results (x = 0, 1) are cos 2 (φ/2) and sin 2 (φ/2) respectively. Through the dependence of the output probabilities from the unknown phase, one can extract information on the parameter. The amount of information available is quantified by a quantity called Fisher information, defined as F (φ) = x P(x|φ)(∂log(P(x|φ))/∂φ) 2 , where P(x|φ) is the likelihood function that corresponds to the probability to obtain a measurement result x, given a certain value of the phase φ. The Fisher information is related to the bound on the variance achievable with any arbitrary unbiased estimator by the Cramér-Rao bound (CRB) [27]: where N represents the number of identical independent probes. For the case of a MZI seeded by single photons, the Fisher information is constant for any phase φ and the CRB reads φ 2 1/N, that is, the standard quantum limit (SQL) which represents the maximum precision achievable with classical probe states. In the limit of a large amount of measurements, estimators such as maximum likelihood or bayesian ones permit to saturate the SQL [27]. However, this is no more true when the measurements and data are limited [72]. In this regime, the Fisher information may not represent the ultimate achievable bound, and nontrivial approaches have to be adopted to optimize the convergence of an estimation process to the ultimate limits. In this way, even if the Fisher information does not depend on the unknown phase, the convergence of the estimation process (in terms of number of resources N necessary to saturate the bound) can be faster around certain phases when the number of data is limited.
Hence, one of the most powerful approach for this problem is provided by adaptive protocols [32,33]. In an adaptive 033078-2 The interferometer is composed of two cascaded beam splitters (BS) and relative phase shifts are inserted between the two paths. Single photons are injected along an input of the interferometer in order to estimate the unknown phase shift φ. At each step k of the adaptive protocol, the control phase shift k is calculated by a processing unit, according to an heuristic that exploits the previous dichotomous measurement result x k = 0, 1 at step k − 1 from detectors (D). (b) Experimental setup, corresponding to a MZI in the polarization degree of freedom able to implement adaptive phase estimation. A spontaneous parametric down-conversion source generates pairs of photons: one photon (signal) of each pair enters in the interferometer, while the other acts as trigger. After a polarizing beam splitter (PBS) and a first half wave plate (HWP) rotated by 22.5 • , the signal photon is prepared in a diagonal polarization state and experiences the unknown phase shift φ between the two polarizations H and V , inserted by the first liquid crystal LC1. The control phase shift k at step k is applied by a second liquid crystal LC2, which is driven by a processing unit that applies the GA-optimized feedback according to the previous measurement result x k−1 . The measurement stage is composed of a final HWP rotated by 22.5 • , a PBS and single-photon detectors (APD) at the interferometer outputs. The result x is generated by the coincidence between the signal photon and the trigger one. protocol for phase estimation, an additional controllable known phase shift can be introduced in the interferometer. The value of can be changed depending on the previous measurement results, so as to tune the total phase shift inside the interferometer near the optimal point during the estimation process. Consider N single photons which are injected, one by one, in one input port of a MZI. The feedback phase at step k will be chosen according to some heuristic and to all previous measurement results {x 1 , x 2 , . . . , x k−1 }. In the case of offline protocols, the rules to change the feedback phase are calculated in advance before the experiment. The list of all the feedback actions is called a policy. Different machine learning techniques have been exploited to calculate such policies [54][55][56][57]. Here, we introduce a novel technique exploiting a genetic algorithm as an offline approach to calculate the policies for phase estimation.

B. Genetic algorithm
Genetic algorithms represent a class of evolutionary computation approach inspired by Darwin's theory of natural selection [70]. Different search-based optimization problems can be faced by GAs. The elements of the search space are termed individuals and represent the possible solutions of the optimization. The aim of the algorithm is to find the individual which optimizes a certain figure of merit called fitness. Starting from a population, which is a group of individuals, the GA evolves it in the search space. The evolution of the population corresponds to moving in the search space. The main principle of the algorithm is biological evolution based on survival of the fittest individuals. GAs are suitable for problems with large search space, requiring no initial information about the nature of the solutions, which is a common scenario for many real world problems. The algorithm decision making also has an advantage in the exploration-exploitation trade-off helping the algorithm to avoid local extremas, and move towards a globally optimum solution in the search space. GAs can be used in a large variety of problems like image processing, artificial intelligence in robotics, computer games, optimization of parameters of other machine learning algorithms like the weights of neural networks, and a variety of engineering problems among others [73,74].
Each candidate solution of the optimization problem has a defined structure (chromosome), that is composed of genes. In some problems, the solutions are represented with binary encoding of the genes, as arrays of 0s and 1s, but encoding using other structures is also possible, for instance the genes can be encoded in the elements of a vector of real values. The goodness of each solution is quantified by the so-called fitness score, calculated through a fitness function that is determined by the objective function of the problem. Such objective function is at the basis of the optimization in the algorithm. During the evolution process, some of the initial set of solutions are selected for reproduction and recombination, according to particular techniques, to move towards new solutions (offspring) in the searching space. The offspring produced solutions for the next generation (or iteration) undergo a process of mutation, leading to the creation of a new generation of individuals. In particular, the genes of the offspring solutions depend on the properties inherited from the previous generation through crossover and random mutation processes. The individuals with higher fitness scores have a larger probability of being selected for the mating process, that allows the production of new fitter individuals. This method ensures the survival of better solutions in the iterative evolutionary process, until the termination criteria is reached or the search saturates in some extrema, either global or local.
Conceptual model for the evolution employed by a genetic algorithm. The humans shown represent an individual or chromosome, with a set of properties known as genes shown by the board they hold. The algorithm starts with the initialization of a group of random individuals, known as population. The population then proceeds to start a cycle of application of genetic operators, namely, fitness calculation, selection, crossover, mutation, and infection. The cycle ends depending on a halting condition and returns the best individual as output. The fitness operation assigns a fitness score to all individuals and sorts them accordingly on that basis. The selection operator selects an individual from the population each time it is used. The crossover operation uses the selection operation to pair up two individuals and produce offspring from them. This reproduction is repeated till the population size is achieved. The produced offsprings then mutate randomly using the mutation operation and the infection operation with some probability replaces one of the individuals with a randomly created new individual. Further, during all the process the individual with highest fitness after the sorting process, shown by the king, is immune to mutation, infection and is not replaced by the offsprings. The king may or may not change at each cycle.
In this work, we exploit a modified version of genetic algorithm (see Fig. 2) suitable to perform optimization in the continuous search space of real vectors representing the policies to be employed in the phase estimation process. We mention in detail the steps of the used GA (for pseudocode, see algorithm 1), and report the employed parameters in Table I.

Population initialization and Fitness calculation
The first step of the protocol is the initialization of a population: a set of lists { } of feedback phase shifts, corresponding the algorithm chromosomes, is randomly generated. This initialization can be realized also taking into account eventual prior information on where the optimal solutions are expected to be located in the search space. In our case, we consider a search space limiting the possible unknown phases in the range [0, π]. In the case of a phase estimation experiment using N single photons, the chromosome associated to each individual is represented by a vector ∈ R N of N real values. Such quantity corresponds to the policy to be applied during the experiment. In particular, during the optimization of the policy with N probes, the population is initialized with the first two chromosomes taken from the best policy for N − 1 probes, with a Gaussian shift in each gene value having a standard deviation linearly decreasing as SQL. Instead, the last value (Nth value) in such policies is chosen as 0. Then, the rest of the population is initialized with completely randomly created chromosomes. This kind of initialization ensures that information from previous optimal policies is properly exploited during the following search processes.
Each candidate solution is then associated to a fitness score, which quantifies the sensitivity of the policy in the estimation of the unknown phases. Then, it is necessary to simulate the estimation of different unknown phases to calculate the fitness. After each step in the simulation, the feedback phase k to be applied at kth step, is updated according to the following logarithmic-search heuristic [54,55]: where x k−1 ∈ {0, 1} is the dichotomous outcome of the measurement at (k − 1)th step. In this approach, the estimator φ est for the unknown phase φ is provided, up to a constant phase, 033078-4 by the last value N of the feedback phase, updated after the last measurement result [54]. The fitness S of a policy , is given by S( ) = | π −π p(θ | )e ıθ dθ |, where θ = φ est − φ is the error on the estimated value of the phase, and p(θ | ) is the probability of the error θ using the policy in the estimation. Such quantity is computed averaging over 10 5 values of unknown phases, uniformly drawn in the interval [0, π]. The chosen figure of merit to be minimized during the phase estimation problem is the Holevo variance V H , that is related to the fitness function as follows [75]:

Genetic operators
The initial population is improved through an iterative process of genetic operations applied on the individual solutions of the population. The fitness score assigned to the individuals determines the best element of the population and also the halting criteria of the optimization process. Three genetic operators, namely, selection, crossover, and mutation are applied to the population during the iteration process. In particular, we employ the process of elitism among the individuals: certain individuals with a very high fitness are immune to the crossover and mutation techniques. This method ensures the survival of the best solutions of the previous generation into the new generation, creating a better mating pool for the next iteration, and preserving the quality of the best candidate solutions. Our algorithm uses a population size of 12 individuals, with a single elite solution immune to changes during each generation.
Selection. In each consecutive iteration, an appropriate number of pair of individuals, the parents, are selected to reproduce and form the new generation. The parents are selected through a method where the solutions with higher fitness have a better chance to be extracted for the mating process. A selection technique could select the two individuals with highest fitness, but this voids the use of genetic diversity which is the basis of evolution. This would also restrict the search for a particular bias, which could get stuck in a local minima. The selection technique used here is the tournament selection method [76] (see algorithm 2). In particular, it corresponds to running numerous tournaments among the individuals in a randomly chosen subset of the population. The victor is determined by the fitness value, and is selected for mating. A large number of tournaments ensures the selection of almost every individual in the randomly chosen subset at least once, creating the possibility of existence of weak and strong individuals together in a given generated subset. This selection technique also maintains the diversity in the genomes during the crossover process by mixing the good genes of parents with the weaker parents, thus ensuring the survival of the fittest along with the selection of a very small proportion of weaker individuals. We use a tournament selection size of five  solutions, which is the size of the subset of the total population composed of 12 candidate solutions. The selection technique returns the best individual from the five randomly chosen individuals. The selection is then exploited different times to extract pairs of individuals used to generate new children chromosome through the process of crossover.
Crossover. Analogously to the crossover that happens during the biological reproduction, the newly generated offspring of the parent solutions share genetic information belonging to its parents. We select two parents by repeating the selection process two times, which then proceeds to generate one offspring solution. The crossover process used in our problem is the uniform crossover technique, in which each element of the new chromosome (gene) is randomly chosen from one of the two parents with equal probability. This spreads out the genetic information evenly among the genes of the offspring, ensuring equal contribution from both the parents. This also ensures the exploitation, or the preservation of better solutions. The mating process is repeated with the selection of other pairs of parents, and their crossover to produce other offspring until a new population generation with the suitable size is produced. In our case, a size 12 individuals was employed. The elite chromosomes are immune to crossover, that is, they are the only solutions not replaced by the new generated child solutions. However, elite chromosomes can take part in the mating process as parents.
Mutation and halting. The newly generated children solutions then proceed to be subjected to the mutation operator. Mutation alters the genetic information in the individuals from its initial state, modifying the solution from the previous one. In our algorithm, chromosomes with higher fitness values S( ) have more immunity to the mutation process. In particular, the mutation probability of each gene of a chromosome is equal to 0.55[1 − S( )]. This rule has been chosen in order to save the fitter individuals from mutation and expose the weaker or less fit chromosomes to it, ensuring the increase in genetic diversity as well as preventing the good solutions from alterations. The number 0.55 signifies the rate of mutation, and has been chosen using trial and error methods for better exploration of the search space when the algorithm reaches saturation near local minima. Every ith gene is mutated by changing the value of the original gene to a value drawn from a Gaussian distribution with mean equal to the original gene, and with variance equal to 1/i, where i represents the position of the gene in the chromosome vector. This mutation variance follows the intuition that, increasing the number of probes, the difference of feedback phase decreases approximately like SQL. Indeed, as the number of probes increase, the necessary variation around the corresponding gene to find the optimal solution is expected to be smaller. During the mutation, we also introduce an infected individual, that is, a randomly created chromosome, in place of one of the two worse individuals. When the number N of photons is less than 25, such infection process happens with a probability of 0.25, otherwise for N 25 the probability of infection is N/100. In this way, the infection ensures a proper random exploration of the search space, maintaining the genetic diversity in the mating pool for the succeeding generations. The new generation, produced through the application of these genetic operators, commonly has an increased average fitness value. The whole described processes of selection, crossover, mutation, and infection are repeated in a cycle until a halting criterion is fulfilled. The halting criterion of the algorithm is the attainment of a threshold fitness value approximately equal to the SQL for the respective value of N, or when the number of generations exceeds a fixed limit. In the latter case, the fitness value can be far from SQL and thus the algorithm fails to reach a value near to the bound.

C. Numerical simulations of algorithm performances
In this section, we perform numerical simulation to study the optimal policies generated by our GA algorithm for phase estimation. We consider values of probes numbers N ranging from 1 to 80. In Fig. 3(a), we report the average of V H obtained by the estimation of 10 5 uniformly distributed unknown phases, showing that the SQL is attained after small values of N. Furthermore, the inset focuses on different distant values of N independently, and demonstrates in both cases a good convergence to every unknown phase. Figure 3(b) shows the results of estimations at each independently optimized N policy, obtained for two different values of unknown phases. Performances of the policies are studied in terms of Holevo variance V H scaling as function of N. These results show the high efficiency of the algorithm even if a small number of resources is used. In particular, the scaling of V H shows the high quality of optimal solutions found by GA research, by which the estimation process reach the true values of the unknown phases. The value of V H for N < 5 lower then SQL depends on the fact that the algorithm is optimized for phases lying in the interval [0, π]. Conversely for greater N, the SQL retains its role as a suitable limit on the estimation precision. In the optical system under analysis, the algorithm finds that the optimal feedbacks are those where the relative phase shift φ − between the two modes of the MZI is π/2, which corresponds to the point in which the likelihood function has a maximum of the derivative modulus.
These numerical evidences demonstrate the effectiveness of the offline policies found by GA optimization, able to optimally estimate unknown phases with sensitivities that reach SQL after few data.

III. EXPERIMENTAL RESULTS
We experimentally tested the GA approach to estimate unknown relative phase shifts inside a MZI injected by single photons. The employed apparatus [ Fig. 1(b)] is a MZI in the polarization degree of freedom, where the optical phase to be estimated is a relative one between the vertical (V) and horizontal (H) polarizations of the photon. Photons are generated by injecting a λ = 404 nm continuous wave pump beam in a periodically poled potassium titanyl phosphate (PPKTP). Through the SPDC process inside the crystal, two degenerate photons with λ = 808 nm are generated. One photon of each pair is sent directly to the trigger avalanche photodiode (APD), while the other one is employed for the phase estimation process after its polarization state is prepared through a polarizing beamsplitter (PBS). The interferometer is composed by two half-wave plates (HWPs) rotated by 22.5 • and two adjacent liquid crystals (LCs) interposed between the HWPs. The first LC controls the unknown phase shift φ and the second one acts as feedback phase . After the second HWP, a final PBS separates H and V polarizations in two different spatial modes that are measured by two APDs. The complete process is automatically controlled by a dedicated software. In particular, all three APDs are connected to an electronic system that reads all the single photon counts and provides digital timestamps to the computer. Through an analysis of such time stamps, twofold coincidences between trigger and one of the two measurement detectors are recorded by choosing a coincidence window of 3 ns. The first coincidence recorded within a fixed amount of time of 0.5 s generates the single event used in the estimation process. After each recorded event, the processing unit recovers the feedback phase to be applied from the pre-calculated list, and consequently drives the corresponding LC. An additional time interval of 0.3 s between two consecutive events is inserted due to the switching time of LC. In this way, all steps of the experiments, including phase tuning, photon detection and application of the GA policies, are controlled by the processing unit.
The apparatus described above has been employed to perform the estimation of different phases. Experimental results are shown in Fig. 4(a), for different phases between 0 and π . Each point, at fixed N, is an average of 100 estimates using the optimal policy for that N. The results show that the estimation reach the true values of the phases after few ∼25 photons. While in an ideal MZI the Fisher information does not depend on the phase φ tot = φ − , the presence of experimental imperfection may cause the bound to be phase-related [see Fig. 4(b)], which is observed when nonadaptive strategies are employed. This different behavior can be predicted by taking into account the effect of noise in calculation of the likelihood function of the system. More specifically, our apparatus is characterized by a nonunitary visibility of the polarization fringe pattern. This effect can be expressed by correcting the likelihood function with a parameter p ∈ [0, 1], related to the visibility as V = (1 − p) = 1, leading to the following output probabilities: where P 0 (P 1 ) is the probability to find the photons with polarization H (V ). By measuring the output probabilities, we characterized both the phase shifts associated to the voltages applied to the LCs, and the experimental likelihood function, obtaining the following estimate for p: p exp = (7.93 ± 0.16) × 10 −3 . The noisy likelihood is associated to a different Fisher information F −1 exp (φ), which leads to different limits on precision for the experimental apparatus, as shown in Fig. 4(b). The difference with the ideal noiseless case becomes significant at the edges of the [0, π] interval. Since the sensitivity depends on the value of the unknown phase, different phases would be estimated with different precision if nonadaptive techniques are employed. Conversely, adaptive strategies allow to obtain a phase-independent behavior for the estimation error. Indeed, the feedback phase is adjusted throughout the protocol by the adaptive strategy to exploit the most informative points of the likelihood. In this way, we observe a phase-independent behavior as shown in Fig. 4(b), where the sensitivities achieve the optimal CRB min φ F −1 exp /N 1.016/N obtained from experimental probabilities [see, for instance, the case N = 80 in Fig. 4(b)]. As a result, the error of the estimations, shown in Fig. 4(c), quickly approaches the SQL as a function of N. In our analysis we consider as figures of merit not only the averaged Holevo variance V H over the M measured phases (blue dots), but also the circular mean square error (MSE) 033078-7 (cyan dots), which is defined as Finally, in Fig. 4(d), we show how the algorithm works in terms of policies applied. Then, after sending and measuring each photon, the feedback phase is updated depending on the outcome according to Eq. (1). The feedback phase shifts are the optimal ones provided by GA protocol. At each step there are two possible outcome values, and the estimation generates a branch. Among all 2 N possible branches relative to each estimation, only the ones observed during the experiment are represented, with intensity proportional to the number of times a given path is followed. In general the change in the feedback phase, that is, the policy, decreases with the step number, meaning an increasing precision of estimation. Finally, the comparison between the unknown phase and the resulting estimation is reported, together with its circular variance over all 100 independent runs.

IV. ROBUSTNESS TO NOISE
In this section, we perform some numerical simulations to study the robustness of the policies generated by our genetic algorithm against different sources of noise. In particular, we consider depolarizing and phase errors, which are two of the most common noise models in interferometric setups. 033078-8 Depolarizing noise is caused by the presence of dark counts or by the limited visibility of the interferometer. This effect is introduced in the simulations via an additional parameter p, which gives the probability of a random click. In this way, the simulated data are drawn by a noisy likelihood distribution having the following form: P noisy (x) = (1 − p)P(x) + p/2, where x = 0, 1 are the measurement outcomes and P(x) is the probability in noiseless condition. This kind of noise has been considered to describe the experimental results of previous section. We now analyze numerically the robustness of the policies generated by our GA approach. More specifically, the policies are calculated by assuming a noiseless experiment (p = 0) and applied to a noisy estimation process (p = 0), where we considered different noise levels corresponding to p = 0.05, 0.15, 0.25, and 0.5. This analysis is performed to quantify the robustness of policies, generated using an ideal model, in noisy conditions. The results of the simulations are reported in Fig. 5(a). These data show that the policies are robust against noises with p 0.25 even if they are trained with ideal conditions. This implies that this technique can be employed also in systems with moderate unknown values of depolarizing noise without losing the capability to reach the ultimate limit provided by the CRB. More specifically the CRB of the optimal point in a depolarizing noisy interferometer, using single photons, is equal to: For larger p, the policies fail to reach the sensitivity bounds.
As a second step, we consider the scenario in which the noise parameter is calibrated before the experiment. In this case, the GA approach can be adapted to generate policies optimized for this scenario by taking into account the actual noise level during the computation. Hence, we generated policies using knowledge of the depolarizing noise parameter. The analysis of the estimation of unknown phase shifts using policies trained in the presence of noises is shown in Fig. 5(b).
Here, as expected, the performances of the estimation processes are improved when compared with those achieved with the policies for p = 0. In particular, the optimal CRB associated to each noise is approached by the estimations using noisy policies. Such improvement is larger for increasing values of p. Note that, for large values of noise (p = 0.5), a difference with the CRB is still obtained, which is to be attributed to the probabilistic feature of genetic algorithms which may fail to reach convergence in a given number of iterations. In conclusion, our protocol is not only robust against depolarizing noise, but can also be adapted to approach the ultimate bound in such noisy conditions. Finally, we considered the effect of phase noise, due to random errors in setting the feedback phase. For instance, this can be attributed to random imperfections in the phase control or to phase fluctuations between the arms of the interferometer. We have numerically simulated phase estimation processes under this noise by altering the value of the feedback phase k by an amount δ . Such amount is randomly generated at each step according to a normal distribution with mean equal to the original value, and standard deviation described by the parameter κ. In these conditions, we test the policies calculated for a noiseless scenario. The results of the simulated estimation processes for different values of the parameter κ are shown in Fig. 6. We observe that the policies generated via GA are robust to this kind of error, even for a considerable amount of phase noise with κ 0.6.

V. CONCLUSIONS AND PERSPECTIVES
Phase estimation is a fundamental task in several applications, ranging from biology to gravitational wave detection. Furthermore, such problem represents a benchmark for general estimation protocols. In this context, an important 033078-9 FIG. 6. Robustness to phase noise. Plot of Holevo variance V H as function of the number N of photon probes, in the presence of phase noise for κ = 0.2, 0.4, 0.5, 0.6, and 0.8. Simulations are performed by using the policies calculated in absence of noise, applied to noisy simulated phase estimation processes. More specifically, A random shift around the feedback phase is added at each step of the experiment, according the normal distribution around the original value with variance κ 2 . Results for different noise intensities are compared with the noiseless case (dark blue dots) and the SQL (red dashed line). Each shown acquisition is averaged over 100 simulated rounds of the estimation process. The distance of the achievable precision from the SQL becomes significant only for high value of the noise. practical scenario is the estimation of phases when a limited number of probes is available. In this condition, adaptive protocols can be employed to optimize the estimation process and to enhance the capability to reach the ultimate limits by using a small amount of resources. In parallel, for some scenarios where the computational power during the estimation is limited, an offline calculation of the optimal feedback actions to be employed during process is required. For such task, machine learning techniques, able to optimize functions in high-dimensional searching spaces, represent a powerful tool.
In this work, we presented a novel technique based on a genetic algorithm, able to find optimal feedback actions for single phase estimations, that are also robust against different sources of noises. We then performed an experimental demonstration of such protocol through a photonic platform showing fast convergence of the estimation error to the ultimate limits after few probes. Such demonstration opens the way to further applications in quantum metrology tasks with limited data. Future steps will require to devise and test experimentally such class of algorithms with different probes enabling quantum-enhanced performances. Even the study of photonic realization of probes states can be improved by GA algorithms [47,50,51], giving rise to accessible and robust-to-noise states for metrology tasks. Then, a natural generalization of this approach is to apply GA optimization for offline protocols in multiparameter quantum metrology problems [17,[77][78][79], with particular attention to the limited data regime [80]. While online adaptive Bayesian techniques for multiphase estimation were demonstrated [81], offline solutions have still to be explored and GA promises to be a useful tool for this task. Notably, this kind of approach can be applied to other quantum information tasks, in which an optimization of multiple feedbacks is needed.