Machine-learning-accelerated Bose-Einstein condensation

Machine learning is emerging as a technology that can enhance physics experiment execution and data analysis. Here, we apply machine learning to accelerate the production of a Bose-Einstein condensate (BEC) of 87 Rb atoms by Bayesian optimization of up to 55 control parameters. This approach enables us to prepare BECs of 2 . 8 × 10 3 optically trapped 87 Rb atoms from a room-temperature gas in 575 ms. The algorithm achieves the fast BEC preparation by applying highly efﬁcient Raman cooling to near quantum degeneracy, followed by a brief ﬁnal evaporation. We anticipate that many other physics experiments with complex nonlinear system dynamics can be signiﬁcantly enhanced by a similar machine-learning approach.

With few exceptions [17], experiments on BECs end with a destructive measurement, which requires repeated BEC preparation.Approaches to increase the BEC production rate, and associated signal-to-noise ratio of the experiments, have generally relied heavily on hardware improvements [18][19][20][21][22], or used atomic species with narrower optical transitions [18,21,22] than offered by the most widely utilized alkali atoms.For alkali atoms, the tight confinement of atom-chip magnetic traps has enabled fast evaporation sequences, with a complex multilayer atom-chip achieving BEC preparation times of 850 ms for 4 × 10 4 atoms [19].Non-alkali atoms featuring narrow optical transitions can be used to reach lower temperatures in narrow-line MOTs [18,21,22].That approach, combined with a dynamically tunable optical dipole trap, has recently been used to prepare BECs of 2 × 10 4 erbium atoms in under 700 ms [22].
In this Letter, we demonstrate a complementary approach where, in a simple experimental setup with a broad-line MOT for a standard alkali atom, machine learning is leveraged to optimize a complex nonlinear laser and evaporative cooling process to quantum degeneracy.Controlling a sequence with up to 55 interdependent experimental parameters, Bayesian optimization [11,12,23] finds parameter values which cool a gas from room temperature into the quantum degenerate regime in 575 ms, creating a BEC containing N BEC = 2.8 × 10 3 atoms.To our knowledge, this is the fastest BEC creation to date.We identify some of the physical strategies discovered by the algorithm, and also investigate how the choice of cost function impacts the trade-off between final atom number and the purity of the created BEC.
Our apparatus employs only a single MOT directly loaded from a 87 Rb background vapor, a crossed optical dipole trap, and two Raman cooling beams as depicted in Fig. 1(a).No Zeeman slower, two-dimensional MOT, atom chip [19], dynamic trap shaping [21], or strobing [22,24] are necessary.Using Raman cooling in a crossed optical dipole trap (cODT), a method that can reach very high phase-space density and even condensation [25], the algorithm achieves a cooling slope of 16 orders of magnitude improvement in phase space density (PSD) per order of magnitude in atom loss (γ = 16) up to the threshold to quantum degeneracy.This is significantly better than the γ = 7 value we could obtain with extensive manual optimization under similar conditions [25].
Atomic physics methods.-TheRaman cooling implementation used in this work is similar to that of Ref. [25].Cooling proceeds in a cODT formed by intersecting two non-interfering 1064 nm beams, one horizontal and one vertical, see Fig. 1(a).Two 795-nm beams drive the Raman cooling: the optical pumping beam and the Raman coupling beam.Raman cooling [26] provides sub-Doppler cooling by driving velocity-selective Raman transitions between hyperfine states, here the |F = 2, m F = −2 and |2, −1 states of 87 Rb [25].The Raman transitions are non-dissipative so entropy is removed from the atomic gas in the form of spontaneously scattered photons as atoms are optically pumped back to the dark state |2, −2 .Light-assisted collisions, which typically prohibit laser cooling at high atomic densities, are suppressed by detuning the optical pumping light 4.33 GHz to the red of the D 1 F = 2 → F = 2 transition, where a local minimum of light-induced loss was observed [25].
The cooling dynamics are controlled via five actuators: (i) the horizontal P y and (ii) vertical P z trap beam powers which set the trap depth and frequencies, (iii) the Raman coupling beam power P R which tunes the Raman rate, (iv) the power P p of the optical pumping beam which sets the optical-pumping rate (and also Raman rate), and (v) the magnetic field B z which adjusts the resonant velocity class for the Raman transition.The cooling procedure is divided into stages during which the controls are linearly ramped, with the endpoints of each ramp constituting the optimization parameters.
Optimization scheme.-Theoptimization problem can be formulated as the minimization of a cost function C, which maps a set of parameter values X ∈ R M to a corresponding cost value C(X) ∈ R, where M is the number of optimization parameters.The cost C quantifies the results, and is generally a priori unknown, but can be extracted from measurements.Bayesian optimization is well-suited for this type of problem as it can tolerate noise in the measured cost and typically requires testing fewer values of X than other optimization methods [11][12][13][14][15][16].
Bayesian optimization begins with collecting a training dataset by experimentally measuring the cost C m (X i ) for various values of sets of parameter values X i .The X i used to construct the training dataset are chosen by a training algorithm, which can implement another optimization algorithm or can select X i randomly.A model of the cost function is then fit to the training dataset which approximates the unknown true cost function C(X).Although Bayesian optimization typically uses a Gaussian process for its model [23], the present work uses neural networks [12,27], which were chosen for their significantly faster fitting time for our typical number of optimization parameters.Once the model is fit, a standard numerical optimization algorithm is applied to the modeled cost function C p (X) to determine which value X i+1 for the next iteration is predicted to yield the minimal cost, as depicted in Fig. 1(c).Optionally this numerical optimization can be constrained to a trust region (a smaller volume of parameter space centered around the X i which yielded the best cost measured thus far).The predicted optimal value X i+1 is then tested by experimentally measuring the corresponding cost C m (X i+1 ).The next iteration begins by retraining the model with the new result, and making a new prediction for the optimal value of X with the updated model.The algorithm iterates until it reaches a termination criterion, such as a set maximum number of iterations, or a set number of consecutive iterations that fail to return better results.All optimization in this work was performed with the open-source packages M-LOOP [11,12] to implement the Bayesian optimization and Labscript [28] for experimental control.Additional implementation details are included in Appendix A.
Cost function.-Sincethe optimization transitions the gas from the classical into the quantum degenerate regime, the final state of the gas depends strongly on how the cost function is chosen as a combination of the two experimentally accessible parameters: atom number N and temperature T .The classical phase space density PSD c is defined as PSD c ≡ n cp λ 3 dB , where λ dB is the thermal de Broglie wavelength and n cp is the calculated peak number density neglecting bosonic statistics (See Appendix B for calculation details).The value of PSD c is nearly equal to the true PSD when PSD 1, while at the threshold to condensation, PSD c ∼ 1.Since the temperature T is more difficult to determine in the quantum degenerate regime, and also requires a fit to the data with potential convergence problems, we instead measure N and the peak optical depth OD in an absorption image.Generally ensembles with larger PSD c have a larger atom number N and less expansion energy, which leads to a larger peak optical depth OD for a given N .Guided by this, we explored cost functions of the form where f (N/N 1 ) is a smoothed Heaviside step function with N 1 chosen near the detection noise floor (see Appendix A).The parameter α in the cost function tunes the trade-off between optimizing for larger atom number or lower temperature.For a pure BEC after sufficient time-of-flight (TOF) expansion, |C/f | scales as (N BEC ) α (see Appendix C).For a thermal cloud, |C/f | is proportional to PSD c when α = −1/5, although that value of α is unsuitable for condensation as increasing the atom number in the BEC requires α > 0.
Optimization procedure.-Thesequence begins with a separately optimized 99-ms long MOT loading and compression period.The trap beam powers are ramped to their initial Raman cooling values during the last 10 ms of the MOT compression and then the magnetic field is adjusted to its initial Raman cooling value in 1 ms, at which point the horizontal dipole trap holds typically N = 2.7 × 10 5 atoms.We then added 100-ms stages of Raman cooling one by one and optimized them individually.After five stages, the algorithm tended to turn off the Raman cooling by turning down P p or P R or by tuning the magnetic field B z such that the Raman transition became off-resonant.We then added up to six shorter 30ms long stages in which the optical pumping and Raman coupling beams were turned off, and the algorithm performed evaporative cooling.Due to the reduced number of parameters, we were able to optimize the evaporation stages simultaneously, which produced a BEC.Subsequently we shortened the Raman cooling and evaporation stages with parameter values fixed until only a small and impure BEC was produced, and then ran a global reoptimization.In this global optimization stage, all 42 of the Raman cooling and evaporation parameters were reoptimized simultaneously using the previous values as the initial guess for X.Often a trust region set to one tenth of the allowed range for each parameter was used.This kept the optimizer focused in regions of parameter space which produced a measurable signal, as adjusting even a single parameter too far would often result in the loss of all atoms.We repeated this sequence shortening and reoptimization procedure until the algorithm failed to find parameters that could produce sufficiently pure BECs.
The required beam powers generally varied over several orders of magnitude, so the logarithm of their powers were used as entries in X, while the magnetic-field control parameter B z was kept a linear parameter.A feedforward adjustment was included in the B z control values to account for the light shift of the |2, −1 state by the optical pumping beam.We averaged over five repetitions of the experiment for each set of parameter values tested.The number of iterations per optimization varied but was typically ∼1000 (including the initial training), and required several hours, both for the single-stage optimizations and the full-sequence optimizations.A simpler optimization procedure was also attempted which did not involve optimizations of individual stages.Instead the sequence was divided into ten 100 ms stages and all 55 parameters were optimized from scratch simultaneously.That approach combined with the shortening and reoptimizing procedure successfully produced a similar BEC, albeit in slightly longer time (650 ms vs 575 ms), possibly due to the optimization becoming trapped in a local optimum (see Appendix A for further discussion).
Results and physical interpretation.-Thebest discovered 575-ms long control sequence and corresponding results are depicted in Fig. 2 and Fig. 3. Notably, the algorithm discovered gray molasses [29,30] in the MOT Control waveforms (a-b) and measured trap and atomic-gas properties (c-e) of the optimized sequence.Gray, blue and oranges shadings mark the MOT loading, Raman cooling, and evaporation periods, respectively.The Raman beam power has been multiplied by 10 3 for better visibility.νx, νy, νz, ν h are the trap vibrations frequencies in the x, y, z directions and in the horizontal trap, respectively; νc is the atomic collision rate.PSDc does not account for bosonic statistics and changes slowly while the BEC forms quickly above threshold.Calculations assume thermal equilibrium.phase, which it applies at the end of the compression sequence.This outperforms the bright molasses [31,32] that was previously used in the manually optimized compression sequence, with the gray molasses loading a similar number of atoms ten times faster.After the MOT loading stage and transfer into the cODT, five ∼63-ms long stages of Raman cooling follow, and then the optical pumping and Raman beams are ramped off, followed by six ∼27-ms long evaporation stages.As observed in previous work [12][13][14], the ramps produced by Bayesian optimization are non-monotonic and appear non-intuitive, but they outperform the routines we found by manual optimization.A reason for the non-monotonic waveforms may be that the cost function includes many local minima.The optimization can settle into any one of these local optima randomly, and produce complex but specific waveforms, as observed in Ref. [12].Despite the non-monotonic ramps, PSD c increases smoothly exponentially during this part of the sequence (Fig. 2(e)), due in part to the finite thermalization rate.
By shortening the sequence we are asking the algorithm to maximize the cooling speed, which is limited by the lower of the collisional rate ν c and the trap vibration frequencies ν x,y,z [33].When the gas is still hot, we have ν c ν x,y,z , and the algorithm employs Raman cooling to increase the density and collision rate (Fig. 2(d)).However, when ν c approaches the lowest trap vibration frequency ν y near the time t = 225 ms, the algorithm starts to reduce the Raman rate, and a little later the optical pumping rate, in order to reduce light induced collisions that scale with ν c , rather than the trap vibration frequency.Subsequently, for times t > 225 ms, the cooling proceeds near optimally, with the collision rate close to, but a little below, the trap vibration frequencies.Furthermore, as the system approaches condensation near t = 410 ms, the collision rate is somewhat lowered to reduce light-induced atom loss (Fig. 2(c)).
Another effect limiting the cooling speed is the loading of the atoms from the single horizontal trap, in which the sample is initially prepared, into the crossed dipole trap (see the movie in SM [34]).Initially, the vertical-beam power P z is held low to avoid creating a high-density dimple region which would lead to excess loss during Raman cooling.Later, P z is ramped up to gather atoms from the horizontal trap beam into the overlap region of the cODT in order to increase the collision rate and speed up evaporative cooling.The relatively sudden ramping of the trap power up and then back down visible in Fig. 2a likely involves an optimal-control-like process since the trap compression and relaxation are faster than the axial period of the horizontal trap of ∼ 200 ms.
The optimization tended to turn off the Raman cooling after five stages because the cloud temperature T was below the effective recoil temperature [25] where Raman cooling, even with optimal parameters, becomes too slow, while leading to trap loss and heating due to light-assisted collisions [35].The Bayesian optimization recognized this and shut down the Raman cooling at this point, with the atomic gas close to condensation.Subsequently, at higher compression which is primarily achieved by increasing the vertical beam power, the horizontal trap power is reduced and atoms are efficiently evaporated along the direction of gravity in the tilted potential [20] (see the movie in the SM [34]).Note also that once the atoms have been loaded into the crossed-trap region (after t = 350 ms), the algorithm makes all trap vibration frequencies similar, which provides the fastest overall thermalization, and hence largest cooling speed.
The BEC is fully prepared at the end of the evaporation stages, 575 ms after the start of the MOT loading.The final cloud contains 3.7 × 10 3 total atoms and is shown in Fig. 3(b).A bimodal fit of the cloud indicates that 2.8 × 10 3 atoms (76 %) are in the BEC.Although the sequence was optimized for speed rather than efficiency, the initial cooling occurs with a logarithmic slope γ = d(log PSD c )/d(log N ) ≈ 16.
Cost function impact.-Theatomic gases produced by sequences optimized for different values of α are presented in Fig. 4, as well as the results when optimizing for total atom number (N ).Larger values of α result in more atoms, but at higher temperature and lower condensate fraction, while smaller values of α produce purer BECs, but with fewer atoms overall.Setting α to 0.5 was found to make a reasonable compromise (orange curve in Fig. 4); so that value was used for the final full-sequence optimization which yielded the data presented in Fig. 2.
Outlook.-In conclusion, we have demonstrated that Raman cooling with far detuned optical-pumping light combined with a final evaporation can rapidly produce BECs with a comparatively simple apparatus, even with a standard alkali atom which lacks narrow optical transitions.Bayesian optimization greatly eased the search for a short sequence to BEC, quickly discovering initially unintuitive yet high-performing sequences.Inspection of the parameters chosen by the algorithm reveals several physical strategies, such as adjusting a collision rate close to, but below the trap vibration frequencies to maximize the thermalization and cooling speed while minimizing density-dependent atom loss, non-adiabatic loading into the crossed-trap dimple, and creating a nearly isotropic trap for efficient evaporation.In future applications, faster condensation can likely be achieved by including dynamical tuning of trap size [21], while user intervention may be further reduced by factoring the sequence length into the assigned cost [14].We anticipate that many other experimental procedures in atomic physics and beyond can be improved by machine learning.

Appendix A: Bayesian Optimization Implementation
In M-LOOP's implementation of Bayesian optimization, the training algorithm used to pick parameters and generate a training dataset is also run periodically even after the training dataset is complete [11,12].In particular, once sufficient training data is acquired, three independent neural networks are trained.Each neural net is fully connected and consists of an input layer with one node for each optimization parameter, followed by five hidden layers with 64 nodes each, and then an output layer with a single node.Once the training has completed, each neural network is used to generate a set of parameter values X which it predicts to be optimal, and each of those three X are experimentally tested.Then another iteration of the training algorithm is performed and the X it suggests are also tested.The results from all four of these measurements are included in the next training of the neural nets for the subsequent Bayesian optimization iteration.The additional iterations of the training algorithm are intended to encourage parameter space exploration and provide unbiased data [11,12].
In this work, the absorption images used to measure the cost function were generally taken after 1.5 to 8 ms of time-of-flight (TOF) expansion.We averaged over five repetitions of the experiment for each set of parameter values tested, which took ∼10 s accounting for experimental and analysis overhead.Simply taking the largest optical depth measured in any single pixel of an absorption image as OD makes it prone to noise, so OD was set to the average OD of several pixels with the largest OD to reduce noise.To compare different sequences on an equal footing during optimizations, the trap beams were always ramped to a fixed power setting before releasing the atoms for TOF imaging.This final fixed ramp is only necessary during optimizations and is omitted from the sequence once the optimizations are complete.The smoothed Heaviside step function f (N/N 1 ) included in the cost function ensures that the cost does not diverge at low N while having little effect when N is above the measurement noise floor.The form of f (N/N 1 ) is inspired by the expression for the excited state population of a two-level system in thermal equilibrium and it is defined as For many of the optimizations in this work, particularly those with tens of parameters, the cost function landscape is "sparse" in the sense that most sets of parameter values yield poor results with a signal below the measurement noise floor.Thus the actual performance for such X cannot be measured, and testing them provides little information to the model.This leads to large regions of parameter space where there is no measurable signal and the direction towards better values cannot be inferred.There are two notable consequences of this.Firstly, for such optimizations it is generally necessary to provide initial values to the optimization which give a nonzero signal.Without a good starting point, the training dataset will often only include measurements dominated by noise, making it exceedingly unlikely for the Bayesian optimization to succeed.Secondly, for such optimizations it is generally helpful to specify a trust region.This limits the extent of excursions as the optimizer explores parameter space, making it more likely to test parameter values which yield a measurable signal.However, this does come at the cost that it makes it less likely for the optimizer to jump from one local minimum to another better minimum.We often performed the same optimization with and without a trust region in parallel.This could be done without significantly extending the duration of optimizations because the analysis for each iteration typically took longer than the time required to perform the experiment.Thus one optimization could run experiments while the other analyzed its most recent results.For optimizations with many parameters, the results with a trust region were typically as good as or better than those without.This is likely a consequence of that fact that, given the sparsity of the cost function landscape, it is unlikely for the optimizer to discover another local optimum.Thus it is better for the optimizer to focus on modeling the region of parameter space around the local optimum rather than fruitlessly searching for another local optimum.
The sparsity of the cost landscape and necessity for providing initial parameters which produce measurable results posed a difficulty when we optimized an entire sequence from scratch at once (rather than initially adding one cooling stage at a time).We resolved this by reducing the time of flight to 1.5 ms for the first optimization.With such a short time of flight, even poor parameter values could produce clouds with a peak optical depth above the measurement noise floor.Due to the finite dynamic range of the absorption imaging, the results of this first optimization produced a cloud which saturated the measurement and thus made it impossible to accurately quantify performance for the best-performing values.The next optimizations were performed with the same sequence duration, but the time of flight increased to 5 ms then 8 ms.This made it possible to better discern differences between high-performing sets of parameter values at the cost of increasing the performance required to produce a signal above the noise floor and thus increasing the sparsity of the cost function landscape.The procedure of shortening then re-optimizing the sequence was then applied, resulting in the sequence presented in Fig. 5(b), which produced a BEC in 650 ms.The parameter α was set to 0.5 throughout this procedure.
Although it is not strictly fair to do so due to the differing parameterizations, it is still informative to compare the control waveforms of the independently-optimized 650 ms sequence to those from Fig. 4.These waveforms are presented in Fig. 5.The sequences of Fig. 4 optimized for different α all had fairly similar waveforms.On the other hand, the 650 ms sequence had a qualitatively different waveform.For example, it lacks the sudden rise and drop in vertical trap power towards the end of the sequence present in the other waveforms.This suggests that it has converged to a qualitatively-different local optimum.On the other hand, the sequences of Fig. 4 were all optimized with a trust region and the same initial X.Thus those optimizations primarily performed a local search, only slightly tuning X to tailor the sequence for their particular value of α.Although there are small differences in parameterization, the fact that two different optimizations with the same value of α produce sequences that differ more than optimizations with the same initial X but different α supports the notion that the cost function landscape includes multiple local minima, as suggested in the main text.

Appendix B: Calculations of Atomic Gas Properties
The classical phase space density is defined as PSD c = n cp λ 3 dB where n cp is the peak number density calculated for a classical gas (i.e.neglecting Bosonic statistics) and λ dB = h/ √ 2πmk B T is the de Broglie wavelength.Here h is the Planck constant, m is the mass of an atom, and k B is the Boltzmann constant.To calculate PSD c for a cloud, its atom number N and temperature T are measured and it is assumed to be in thermal equilibrium.The value of λ dB is easily calculated from the measured temperature.The partition function Z = f B (x)dV is then calculated by numerically integrating the Boltzmann factor f B (x) = exp [−U (x)/(k B T )] over the trap volume, where U (x) is the trap potential at position x.The U (x) is taken to be the sum of two Gaussian beams, one for each cODT beam, and gravity is neglected for simplicity.Each Gaussian beam with peak depth U i,0 and waist w i,0 contributes a potential of the form where w i (z ) = w i,0 1 + (z /z R ) 2 is the spatiallyvarying beam width and z R = πw 2 i,0 /λ is the Rayleigh range.The primed coordinates z and r are taken to be along and perpendicular to the beam's propagation direction respectively.The value of n cp can be calculated as N f B (x 0 )/Z where x 0 is the position of the bottom of the trap.Finally PSD c is evaluated from its definition in terms of n cp and λ dB .Notably, for much of the sequence the atomic cloud extends out of the cODT region and into the wings of the horizontal ODT, in which case the trap potential seen by the cloud is not harmonic.Thus the well-known result PSD c = N ( ω) 3 /(k B T ) 3 for a harmonic trap with geometric mean trap frequency ω cannot be used for most of the sequence.
Calculation of the mean collision rate ν c requires averaging the collision rate n c σv rms over the cloud where σ is the atomic collision cross section and v rms is the root-mean-square relative velocity of atoms in the cloud.The value of n c varies over the trap and obeys n c (x) = N f B (x)/Z, again neglecting Bosonic statistics.From equipartition for a 3D gas, (1/2)µv 2 rms = (3/2)k B T where µ = m/2 is the reduced mass for two atoms.Thus the value of v rms is given by 6k B T /m.The local collision rate is averaged by integrating n c σv rms over the cloud, weighted by the 1-atom number density n c /N , yielding The above calculations assume that the cloud is in thermal equilibrium, which is often a good approximation.However, after about 440 ms of the final optimized 575 ms sequence, the power in the vertical trapping beam P z is rapidly increased, as can be seen in Fig. 2(a).This change is likely non-adiabatic for atoms in the wings of the horizontal ODT and the cloud may no longer be in thermal equilibrium.This is likely why the calculated PSD c appears to increase beyond ∼1 before the appearance of a BEC.Notably this non-adiabatic portion of the sequence occurs only after PSD c has reached 0.4, and thus it does not affect the cooling efficiency estimate of γ ≈ 16 for the cooling up to PSD c = 0.1.
The peak trap depth U i,0 for each beam was determined from the beam waist w i,0 and radial trap frequency ω i,r measured for each beam.The beam waists, defined as the radius at which the intensity falls to 1/e 2 of its peak value, were measured by profiling the trap beams on a separate test setup which focused the light outside of the vacuum chamber.The trap frequencies were directly measured by carefully perturbing the position of a cloud in the cODT and observing its oscillations.Before Furthermore, the chemical potential for a harmonicallytrapped BEC scales as µ ∝ N 2/5 BEC [36], so A ∝ N 2/5 BEC and OD ∝ N 3/5 BEC .The expression OD 3 N α−9/5 then scales as (N BEC ) α .Notably this scaling also applies to a harmonically-trapped BEC when imaged in situ.There, the BEC radius R scales as R ∝ N 1/5 BEC [36].In that case, A ∝ R 2 ∝ N 2/5 BEC as before.The same arguments then apply again, indicating that OD 3 N α−9/5 scales as (N BEC ) α for a harmonically-trapped BEC in situ just as it does for a BEC after long time of flight expansion.
The scaling of OD 3 N α−9/5 for a purely thermal cloud is also of note.For a harmonically-trapped thermal cloud, the RMS size in a given direction for any time of flight is proportional to T 1/2 , so A ∝ T .Thus OD ∝ N/T and OD 3 N α−9/5 scales in proportion to N α+(6/5) /T 3 .Clouds with smaller temperatures are favored by the cost function, and clouds with larger atom numbers are favored as long as α > −6/5.For the case α = −1/5, the value of OD 3 N α−9/5 scales in proportion to N/T 3 , which is proportional to PSD c .That choice of α was often used when optimizing individual stages before reaching the threshold to BEC.However note that this choice of α leads to the scaling OD 3 N α−9/5 ∝ N −1/5 BEC for a pure BEC and is thus not a good choice when the cloud reaches condensation.
FIG. 1.(a) Setup showing 1064-nm horizontal (waist w h =18µm, beam slightly tilted downward) and vertical (wv=14µm) optical-trapping, 795-nm Raman coupling (wR=500µm) and optical pumping (wx=30µm, wy ≈ 1 mm), and 780-nm absorption-imaging beams.(b) Absorption image used to extract the cost function for a set of parameter values X. (c) Bayesian optimization with a neural network.The model Cp(X) (orange solid line) attempts to predict the actual system performance C(X) (blue dashed line).The algorithm uses the model to predict optimal parameter values Xi+1 (open diamond), tests those values, and performs a new iteration with an updated model.

FIG. 4 .
FIG. 4. Cross sections of 24-ms TOF images (200 averages) optimized for different cost function parameter α (see main text) with 1-s long sequences, demonstrating the trade off between optimizing for atom number or temperature.Also plotted are the results of optimizing for atom number N only.Inset: Condensate fraction NBEC/N vs N for different α.