Optical lattice experiments at unobserved conditions and scales through generative adversarial deep learning

Machine learning provides a novel avenue for the study of experimental realizations of many-body systems, and has recently been proven successful in analyzing properties of experimental data of ultracold quantum gases. We here show that deep learning succeeds in the more challenging task of modelling such an experimental data distribution. Our generative model (RUGAN) is able to produce snapshots of a doped two-dimensional Fermi-Hubbard model that are indistinguishable from previously reported experimental realizations. Importantly, it is capable of accurately generating snapshots at conditions for which it did not observe any experimental data, such as at higher doping values. On top of that, our generative model extracts relevant patterns from small-scale examples and can use these to construct new configurations at a larger size that serve as a precursor to observations at scales that are currently experimentally inaccessible. The snapshots created by our model---which come at effectively no cost---are extremely useful as they can be employed to quantitatively test new theoretical developments under conditions that have not been explored experimentally, parameterize phenomenological models, or train other, more data-intensive, machine learning methods. We provide predictions for experimental observables at unobserved conditions and benchmark these against modern theoretical frameworks. The deep learning method we develop here is broadly applicable and can be used for the efficient large-scale simulation of equilibrium and nonequilibrium physical systems.

Ultracold atoms provide a controlled environment for the study of emergent phenomena in many-body physical systems-including high-temperature superconductivity, many-body localization, or topological quantum phases-as well as fields such as cosmology and quantum chemistry [1][2][3][4]. Hence, finding a unifying theoretical or numerical model able to create configurations with statistics that conform to all experimental observations is of crucial importance. Indeed, this allows one to use the model to make predictions for conditions that currently can not be experimentally realized, in addition to obtaining a better understanding of the system. Herein we propose using generative deep learning to create such a model, that learns to produce configurations indistinguishable from those experimentally obtained and can also make predictions for configurations at larger scale or at unobserved control parameter values. The capability of discriminative machine learning to analyze physical systems has by now been well established, both for data obtained through numerical simulations [5][6][7][8][9][10][11], and from experimental observations through electronic quantum matter visualization [12], quantum gas microscopy [13], or momentum-space density images [14]. In these machine learning applications, a neural network is trained to predict properties y of configurations x, i.e. learn the conditional probability p(y|x). Examples include the characterization of phases of matter, or efficiently calculating properties of individual microstates. The use of generative machine learning is relatively unexplored within experimental science; it is a much more complex problem as it requires the modeling and sampling of a probability distribution p(x, y) not known a priori. Yet, generative learning provides a particularly attractive approach as it relies on automatic pattern recognition-and hence does not focus on the reproduction of a specific physical quantity, which can introduce bias, or require prior knowledge about the system. Recently, Boltzmann generators [15] were trained on the energy functional of many-body systems to directly generate low-energy equilibrium configurations and can overcome rare event sampling problems in simulations. A particular class of generative models, namely restricted Boltzmann machines, has seen use as efficient variational ansätze for quantum many-body wave functions [16][17][18][19][20][21]. In return, physics-inspired algorithms (tensor networks) are now being explored for discriminative and generative tasks in machine learning [22][23][24].
Here, we show that generative deep learning can be used to represent and sample the distribution of snapshots of an experimental realization of the Fermi-Hubbard model with ultracold atoms in an optical lattice. The format of this article is as follows: we first discuss our application area; next, we outline our generative model 'RUGAN', and finally we apply it to previously reported experimental data. Here, only one spin species is studied, while the other spin species, holes, and doublons, are all detected as empty sites. C, Sign-corrected spin correlations Cs(r) of Eq. (2) as a function of the hole doping δ for nearest neighbors (left, r = 1) and next-nearest neighbors (right, r = √ 2). The correlations obtained with our generative model 'RUGAN' (1000 snapshots generated for each doping) are consistent with experiment. Note that our generative model has not been optimized on data corresponding to the largest doping values δ 0.24 (shaded area), but is still able to produce configurations with correlations that match well with experimental observations. The geometric string theory consistently underestimates the nearest-neighbour correlations Cs(1) while for both distances considered, the spin correlations obtained with sprinkled holes are overestimated.
The Fermi-Hubbard model is of particular interest as it is suggested to hold the key to understanding hightemperature superconductivity [3,25,26]. The Fermi-Hubbard model is described by the Hamiltonian whereĉ † i,σ (ĉ i,σ ) is the creation (annihilation) operator of a spin σ ∈ {↑, ↓} on site i. The first term corresponds to tunneling between neighboring lattice sites i and j. The second term accounts for the on-site interaction between fermions with opposite spin. Here we consider the strongly correlated regime, where U/t 1. The Fermi-Hubbard model can be experimentally realized with ultracold atoms trapped in an optical lattice (Fig. 1A) [1][2][3]27]. Quantum gas microscopy provides us with siteresolved snapshots of these quantum states, imaging either the total atom distribution or that of a single spin species. Holes and doubly-occupied sites (doublons) are detected as empty sites (Fig. 1B). Recently, the use of discriminative machine learning for classifying snapshots of ultracold atomic gases has been investigated [13,14] and we use the same data set [28] as in Ref. [13] to train our generative model. At half filling, the Fermi-Hubbard model is theoretically relatively well understood and maps to the Heisenberg model with superexchange coupling J = 4t 2 /U . In this case, long-range antiferromagnetic (AFM) correlations are present throughout finite-size systems for temperatures T J. The Fermi-Hubbard model is not as well understood when straying from the half-filling regime through the addition of holes. The motion of these holes displaces strings of spins and hence hides the AFM order observed at half filling; this has recently been experimentally observed [29]. These observations can be partially explained in the framework of geometric string theory, in which strings of displaced spins are added to a background of experimental snapshots produced at half filling [30] (see Supplementary Material). The length of these strings is dependent on the ratio t/J and the strength of the AFM correlations present in the states obtained at half filling. Note that considering the motion of the holes is crucial to describe experimentally observed hidden order, as this is not correctly accounted for by randomly adding holes to a state obtained at half filling ("sprinkled holes"). The objective of this work is to develop and train a generative model capable of representing and sampling the distribution of the experimental snapshots at requested doping values. Such a model allows for efficient augmentation of the experimental data set, and can offer predictions for properties of microstates at dopings for which no experimental data is available, or at larger spatial scales, and hence can be used for quantitative testing of theoretical frameworks. Our model is validated by comparing observables obtained experimentally and with the deep learning algorithm. We first consider the AFM correlations [29,[31][32][33] in snapshots created by our model-discussed in detail below-by evaluating the sign-corrected spin correlation for sites at relative displacement r with r = r 2 and · p denoting the p-norm: More details on the calculation of these correlations are given in the Supplementary Material. The correlations between nearest neighbours (r = 1) and next-nearest neighbours (r = √ 2) measured on snapshots created by our generative model are shown in Fig. 1C along with the theoretical predictions by both the geometric string theory [30] and sprinkled holes [29]. Note that neither of these theoretical models create samples with the correct correlations at large hole-doping values δ for both r = 1 and r = √ 2. Our generative model is able to accurately capture the correlations across all hole-doping values-even at doping values for which it is not optimized.
For this work, we developed a new generative approach called a "regressive upscaling generative adversarial network" (RUGAN). After being shown a data set of small-scale configurations-called the training set-of a physical system, the RUGAN allows for the direct generation of new microstates at any given scale and with desired properties. The RUGAN is able to generalize in two ways: 1) it can create microstates with properties for which no training data is available, and 2) it is able to create samples at a much larger scale (or 'upscale') than the training examples. The former is relevant for systems where obtaining configurations is numerically or experimentally feasible only for a limited set of system properties. As only small-scale samples are required for training, the latter generalization enables efficient sampling of configurations at scales inaccessible to traditional methods, either due to excessive computational cost or experimental restrictions on the imageable system size.
The RUGAN is based on generative adversarial networks (GANs) [34][35][36], which are the combination of two neural networks competing against each other as adversaries. One network, G, acts as a generator, taking samples z randomly drawn from a latent space as input and transforming these to create new samples with distribution P g ∼ G(z). The other network works as a critic, learning to discern between samples coming from the generator and the example data set with distribution P r . These networks depend on many free parameters that are trained simultaneously, with the generator trying to trick the critic, and the critic learning how to better tell apart the generator's propositions from the training examples by measuring the distance between P r and P g . A successfully trained GAN converges to a state where the generator is so good at producing samples that the critic cannot tell the generated samples from the reference training set. The generator and critic of a GAN can be conditioned on additional information such as known properties of individual samples [35]. This allows one to control the region of configuration space from which the generator produces new configurations. GANs are typically used in image processing such as super-resolution [37] or cross-domain pairings (e.g., pairing shoes with matching handbags) [38], but have also been applied to the Ising model [39,40], scalar field theories [41,42], and inverse molecular design [43].
To allow the RUGAN to create samples of arbitrary size, our generator is designed such that it consists solely of translationally equivariant operations that can be applied to latent inputs of any size. Motivated by the locality of the interactions in the models studied here, the RUGAN consists of deep residual convolutional networks. Convolutional neural networks, by construction, have a limited receptive field defined by the size of the convolutional kernels and the depth of the network, and can not account for arbitrary-range interactions. Hence, the network depth required to accurately model a physical data set gives us a proxy for the typical correlation lengths present in the individual configurations. However, long-range interactions could also be efficiently included by making use of attention layers [44]. Once optimized on the training data, such a generator can then efficiently create configurations containing a much larger number of sites. The validity of the upscaling procedure is dependent on the same physical length scales being present in both the small-size training examples and the sizes to which we scale. This also means that the failure of creating configurations at a larger scale (i.e., resulting in different statistics than computationally or experimentally obtained) will point us towards the appearance of physics at a new, larger length scale -a typical example is the divergence of the correlation length at a critical point which cannot easily be captured by RUGAN. Another advantage of the fully-convolutional design of the RUGAN is that it provides an optimized starting point for the training of larger systems.
Along with technical details on the design and training of the RUGAN, we give a simple illustration and results on classical spin models in the Supplementary Material.
We now return to the use of generative deep learning to tackle the challenging task of modelling an experimentally obtained sample distribution. To this aim, we train a RUGAN on experimental snapshots of a doped two-dimensional Fermi-Hubbard model on a square lattice, and condition it on the doping ratio δ. The output of the RUGAN is hence a series of synthetic snapshots at prescribed doping values ( Fig. 2A). We then apply the same analysis procedure to these as to the experimentally obtained snapshots [29].
In Fig. 1C, we demonstrate that a RUGAN can in this way produce synthetic snapshots with high AFM spin correlations C s (r) at small hole doping values δ, and that it correctly models the decay of C s (r) in the snapshots upon increasing δ. We now show that its synthetic configurations also capture the more intricate hidden order present in the experimental snapshots, quantified by the number and average length of strings of spins displaced by hole motion as a function of δ. More information on the determination of these observables can be found in the Supplementary Material. These The distribution of snapshots created by the RUGAN (blue) cannot be distinguished from the experimental snapshots (orange) with other unsupervised machine learning methods. Here, we show a dimensionality reduction with the UMAP algorithm [45] of an equal number of synthetic and experimental snapshots. C, The number of strings exceeding a length of 2 per system site and D, the average length (sites) of the string patterns detected by the algorithm described in [29], as a function of the doping δ. The string statistics obtained with the RUGAN (with which 1000 snapshots were created for each doping value) are consistent with experimental observations. The intermediate doping values 0.09 δ 0.23 in the shaded area are not included during training of the generative model, and the RUGAN interpolates between its knowledge at low and high δ for its creation of snapshots at these dopings. Note that both theoretical frameworks match perfectly with the experimental data at half filling by construction. For high values of the doping δ, the theoretical models underestimate the average string length. (Inset) Illustration of the string pattern formation due to a hole moving through an AFM ordered state, which leaves behind a trail of displaced spins. E and F, Same as in C and D, but now the RUGAN is not provided with data corresponding to large doping values δ 0.24 during training and is required to extrapolate to new values of δ. For the extrapolation regime, we show the statistics obtained by two independently trained RUGANs.
statistics are shown in Fig. 2 for the synthetic samples, along with the experimentally determined values, and the predictions made by the theoretical frameworks developed in Refs. [29,30]. Remarkably, the RUGAN is able to accurately generate snapshots that exhibit the correct string statistics even at values of δ on which it is not optimized. In Fig. 2C,D we show that when the RUGAN is trained on a subset of the experimental snapshots restricted to the extrema of experimentally available doping values, it still succeeds in generating configurations at intermediate δ with string statistics matching closely with those observed in experiment. Exploiting this feature dramatically reduces the already small number of experimental observations required to train the RUGAN. Excluding the largest values of δ from the training set allows us to assess the RUGAN's ability to extrapolate its learned knowledge of configurations with smaller δ. The observation in Fig. 2E,F that the string statistics obtained with this extrapolation procedure again match closely with experimental results, showcases the RUGAN's capability to predict complex correlation patterns, and indicates that the synthetic configurations can serve as a benchmark for quantitative comparison with theoretical developments at conditions where no experimental data is available. We stress that, though still performing better than theoretical predictions for doping values not included in training, the reliability of extrapolated predictions does of course eventually decrease as predictions are made at doping values farther and farther away from those used during training. In Fig. 2E,F, we show an example of the variation in extrapolated values that occurs far from training conditions. A current limitation in experiments with quantum gas microscopy on ultracold atoms is the limited number of sites that can be imaged, so that experimental snapshots of the optical lattices currently typically consist of less than 100 sites. The upscaling ability of the RUGAN allows us to obtain a useful precursor of what could be observed at large scale while experimental realizations containing more lattice sites are still unavailable. Given the success in performing this upscaling for classical spin models (see Supplementary Material), and the excellent agreement with experimental data for small-scale samples, these configurations can serve as a benchmark for future experimental observations of larger optical lattices. In Fig. 3A, we show such configurations obtained by our RUGAN consisting of approximately four times as many sites as the experimental examples. As RUGAN enables us to synthesize a distribution of large-scale snapshots, we can use these to predict the behavior of physical observables at scales that are experimentally inaccessible. In Fig. 3B,C we show the string count per site and average string length as a function of doping δ, measured on snapshots created by the same RUGAN as in Fig. 2E,F. We have demonstrated the success of a RUGAN in modeling the distribution of experimental snapshots of a doped Fermi-Hubbard model. To this aim, deep neural networks are trained on quantum gas microscopy images of ultracold atoms in an optical lattice. RUGAN efficiently generates synthetic samples at requested doping values, indistinguishably distributed from the training data, and these are analyzed in the same way as one would treat experimental observations. Whereas current theoretical frameworks of this model often focus on the description of a number of specified observables, such as spin-spin correlators or hidden order, the power of generative learning lies in its unbiased learning procedure. Hence, especially at large doping values, the synthetic snapshots created by RUGAN provide a better match with experimental observations than current theoretical predictions. On top of that, the RUGAN provides the ability to sample snapshots at experimentally unobserved doping values or at larger spatial scales, and thus opens the door for quantitative testing of new phenomenological, analytical, and numerical models on synthetic data under conditions where no experimental data is available. The observation that a RUGAN is able to make highly accurate predictions across the whole doping regime provides evidence for the existence of a unifying theoretical model. Establishing such bounds is extremely useful across many physical domains including nucleation, phase transitions, and non-equilibrium growth.

SUPPLEMENTARY MATERIAL
A. RUGAN

Generative adversarial networks
A generative adversarial network [34] consists of a generator, which maps latent samples to new configurations, and a critic, which measures the distance between the distributions of the real samples P r and those of the generated samples P g . The distance metric used by the critic in our work is the Wasserstein-1 or Earth-Mover distance (WGAN), which allows for stable training [36]. The Wasserstein-1 distance W between P r and P g is given by with Π(P r , P g ) the set of joint distributions whose marginals are P r and P g . As the infimum in Eq. (3) is intractable, the Kantorovich-Rubinstein duality is used to write with the supremum taken over all 1-Lipschitz continuous functions. The training objective can now be formulated as a minimax game between two neural networks, the critic C and generator G: where During training, the critic is optimized to maximize L, and thus finds an estimate for W (P r , P g ). The generator then learns to minimize this distance, so that its sampled distribution P g is similar to P r (Fig. 4A).
The critic in the WGAN construction needs to be 1-Lipschitz continuous over the whole domain to find a correct estimate for the Wasserstein distance. While enforcing this constraint everywhere is impracticable, a good approximation can be obtained by adding two regularizing terms to the loss function of Eq. (6): and as new objective A differentiable function is 1-Lipschitz continuous if it has gradients with at most unit norm over the whole domain. Hence, the first of these regularizing terms (Eq. (7b)) penalizes the critic such that the norm of the gradient equals one for samplesx sampled from Px [46]. As enforcing this over the whole support domain is intractable, the distribution Px is sampled uniformly on straight lines between data points in the training distribution P r and generated distribution P g . The limitation on how Px is sampled leaves much of the domain unconstrained. In particular, Lipschitz continuity over the manifold that supports the training distribution P r is not properly enforced until the distribution P g lies close to P r . To alleviate the lack of Lipschitz-constraint on the training manifold, a third term (Eq. (7c)) is added to the loss function that explicitly enforces Lipschitz continuity close to it [47]. Lipschitz continuity requires that for two points x and x close to one another, the distance C(x ) − C(x ) 2 is bounded by a constant. Enforcing this criterion is accomplished by perturbing every training data point twice, with small random perturbations (δ 1 , δ 2 ), and minimizing the distance between the critic output of these configurations. In practice, the perturbation of samples is achieved by adding dropout [48], which disables nodes in a layer with a specified probability, to several layers of the critic and feeding it the same data point twice.
To condition the generator on system properties, we provide it with both a random sample from the latent space and labels describing the desired properties (e.g., the energy, magnetization, or doping of configurations) as input [35]. Meanwhile, the critic is shown this same label for the generated configurations, while receiving the exact label for real samples. The critic uses this additional label during its estimation of the Wasserstein distance, prompting the generator to adapt by creating configurations that have features accurately described by their label. Note that since the critic and generator find efficient internal representations of these quantities through training, they are never explicitly evaluated during training. This implies that when we want to condition the generation on expensive operators, the vast amount of computational effort lies in generation of the small-scale samples. Creation of new, large configurations only requires a single pass through a convolutional neural network, and hence comes at a much smaller cost. Additionally, making generative models interpretable, i.e. understanding how it models the interactions between the degrees of freedom [49][50][51], would lead to new insight for further theoretical developments.

Neural network architectures
Both the critic and generator are implemented as deep residual convolutional neural networks, where residual functions with respect to the layer inputs are learned (Fig. 4C), which allows for more stable training [52]. In order to be equivariant under translational operations and achieve the upscaling described in the main text, the generator consists only of convolutional layers, with no dense (i.e. fully-connected) layers that are commonly included in neural networks (Fig. 4C). We add a hyperbolic tangent activation function to the last layer of the generator in order to obtain a valid output representation. As we want to generate configurations with an approximately circular shape for the optical lattice data set, we manually apply a mask that sets the borders to zero after creating a square configuration with the generative network.
For the hyperparameters in our WGAN implementation, we use λ 1 = 10 and λ 2 = 2 in Eq. (7) [46,47]. The dropout rate is set to 25% for two layers in the critic to evaluate Eq. (7c). The weights of the neural networks are optimized with the ADAM optimizer [53], where we set the learning rate α = 10 −4 and the exponential decay rates for the first and second moment estimates to β 1 = 0 and β 2 = 0.9. To gain a more reliable estimate of the Wasserstein distance before updating the generator's weights, on the experimental data we train the critic on 20 batches for every generator training iteration. For the classical spin models we use a 10:1 ratio of critic updates to generator updates. Each model is trained for 2000 epochs.
Details on the network architectures (e.g., number of layers and channels) can be found in our open-source implementation at http://clean.energyscience.ca/ codes.

Model selection
Once training is complete, we use each model epoch to generate a number of configurations for every conditioning label. As the Wasserstein distance quickly stabilizes (after a couple of epochs) to a constant value during training, we resort to a different selection criterion to decide on which model is ultimately deployed. Here, we evaluate the squared deviation between several observables (as shown in the main text) measured with the experimental and RUGAN snapshots (weighted by the experimental error) and select the model that minimizes this deviation. Naturally, when data corresponding to certain conditioning labels is not shown during training (i.e., for the demonstration of interpolation and extrapolation), data generated at these labels is also not used for model selection, and we select the model that performs best on the training regime. The results obtained when interpolating between the lowest and highest doping values (Fig. 2C,D) consistently match well with experimental values across the model epochs. As expected, in the case of extrapolation, the results when generating data at doping values much higher than those trained on are less robust, as detailed in Fig. 2E,F. We also demonstrate this in Fig. 5, where we provide the RUGAN with even smaller subsets of the available doping values than in Fig. 2E,F. Though again better than theoretical predictions, the string statistics measured on the synthetic configurations start to deviate from the experimental observations for the highest doping values, even for the model epochs that perform best on the training set.

An example: the Ising model
We now give a detailed illustration of our framework on the prototypical classical Ising model on a two-dimensional square lattice of length L with Hamiltonian H = −J i,j s i s j , where the sum runs over nearest-neighbor spins and we set J = 1. For N binary spins s ∈ {−1, +1}, the configuration space of the Ising model has a dimension of 2 N ; sampling configurations with desired properties directly from this space is intractable for all but trivial system sizes. Here we show that this task can be accomplished with a RUGAN by training it on a data set of small-scale Ising configurations, and conditioning it on the energy and magnetization density m = i s i /N of each training example.
The conditioning allows for the efficient creation of microstates with desired properties from the high-dimensional configuration space as well as the creation of microstates with conditioning labels for which no training examples are available. We implemented a modified form of umbrella sampling called "targeted sampling" [11] so that we can obtain a training set with a uniform energy distribution. This sampling method resembles the Metropolis-Hastings algorithm in structure, but instead of seeking low-energy states, we target specific energies, accepting configurations that move us toward the target energy, and rejecting ones (with a Gaussian probability) that lead us away. In doing this, we can collect examples across the energy spectrum in an efficient way. In Fig. 6A, we  states. Note that spin-flip symmetry is not enforced, which could potentially decrease the errors shown here. The accuracy in this conditioning is retained when using the upscaling property to create configurations at much larger scales (Fig. 6B). This implies that we can greatly accelerate sampling of uncorrelated large scale synthetic configurations, as costly simulations are only needed for a small subset of the configuration space and only of small-scale microstates.
B. Hidden order in the two-dimensional Fermi-Hubbard model To benchmark the string patterns in the experimental data and our RUGAN, we compare them to the frameworks of sprinkled holes and geometric string theory [30] and apply an analysis procedure identical to what has been previously used to describe experimental observations [29]. For simplicity, we here recapitulate these but point towards Refs. [29,30] for more detailed information.

Sprinkled holes and geometric string theory
Sprinkled holes is a model for the doped Fermi-Hubbard model in the limit of non-interacting holes. To obtain snapshots at different dopings, we start from experimentally obtained snapshots at half filling and add holes on random positions until the doping matches the requested one.
Geometric string theory is a theoretical model where holes do not interact with each other but do interact with the surrounding spins. First, a single hole is placed at a random position on the lattice. The dynamics of the hole can be described by introducing an effective Hamiltonian and an effective Hilbert space for a single string [29,30]. The Hilbert space consists of string patterns, which can be viewed as paths without loops on a Cayley tree with coordination number z = 4. Using the frozen spin approximation and U t, the strings can be modeled by the effective Schrödinger equation where ψ l is a shorthand notation for a path on the Cayley tree of length l, and ψ l+1,s denotes the string obtained by continuing the string ψ l along one of the z − 1 directions on the Cayley tree. The parameter t is the coupling constant for tunneling between string lengths (equal to t in the Fermi-Hubbard Hamiltonian) and V l is an effective potential. The effective potential V l = (dE/dl)l + gδ l,0 consists of a linear tension with magnitude dE/dl and an attractive term with magnitude g. Solving this string model for finite temperature yields a string length distribution p(l). The snapshots are then created by starting from experimental snapshots at half filling and adding strings on random positions in the lattice-with a length according to the string length distribution p(l)-until the desired doping is reached. More details can be found in Refs. [29,30].

String detection algorithm
In the main text the number of strings and their average length as a function of doping are compared between the RUGAN, experiment and the theories explained above. The detection algorithm of these strings is applied to snapshots where one of the two spin species and doublons are removed, and is performed in multiple steps [29]. The geometric strings describe the deviation between the doped Fermi-Hubbard snapshots and a checkerboard state. Hence, the first step involves selecting a window (here with a diameter of 7 sites) for each configuration with the highest staggered magnetization. Using a window with a diameter smaller than the configuration itself negates some of the finite temperature effects. For a given doping, 60% of the resulting windows with the highest staggered magnetization are kept for further analysis. In the next step, each of these windows is compared to a checkerboard state. The strings are then identified as deviations from this checkerboard, and the string-pattern length distribution p δ (l) is measured.
As for the string count shown in Fig. 2, only those patterns of length greater than two are included as to negate the contribution from quantum fluctuations such as doublon-hole pairs [29]. The average string lengths l(δ) are calculated from the string length histograms as l(δ) = l lp δ (l)/ l p δ (l).

Spin-spin correlators
One way to assert the validity of the snapshots created by the RUGAN described in the main text is to verify whether the sign-corrected two-point spin correlator matches well with experimental results.
Here, S z i = 1 2 (n ↑ i −n ↓ i ) withn σ i the number operator for spin σ on site i . The spin correlator can be calculated from the experimental snapshots as [31] C s (r) = (−1) r 1 2 σ∈{↑,↓} where the spatial indices are dropped for simplicity. Here, p denotes a singly occupied site, the expectation value · N R is taken over images where neither spin species was removed, and · Rσ over images where the spin state σ was removed. To calculate the expectation values · N R , we train a second RUGAN on a data set of snapshots containing both spin species, and also condition it on the doping. Here, the doping conditioning can be explicitly checked, as the doping δ ≈ 1.22(0.905 − n s ) where n s is the density of singly-occupied sites [29].

Data set
The experimental data of the Fermi-Hubbard model is obtained from Ref. [28]. These data are obtained at temperature T = (0.65 ± 0.04)J, and the ratio U/t = 8.1 (2). Here, a mixture of the two lowest hyperfine states of 6 Li is trapped in a two-dimensional optical lattice. Siteresolved measurements of the occupation in the optical lattice are obtained with high-resolution quantum gas microscopy [33]. The experimental snapshots of the atomic distributions consist of a circular region of 80 sites. In total, 8822 images of the atomic distributions with both spin components present and 17233 with one spin component removed are available. We augment the data set by also including these samples rotated by multiples of 90 • .