Enhancing associative memory recall and storage capacity using confocal cavity QED

We introduce a near-term experimental platform for realizing an associative memory. It can simultaneously store many memories by using spinful bosons coupled to a degenerate multimode optical cavity. The associative memory is realized by a confocal cavity QED neural network, with the cavity modes serving as the synapses, connecting a network of superradiant atomic spin ensembles, which serve as the neurons. Memories are encoded in the connectivity matrix between the spins, and can be accessed through the input and output of patterns of light. Each aspect of the scheme is based on recently demonstrated technology using a confocal cavity and Bose-condensed atoms. Our scheme has two conceptually novel elements. First, it introduces a new form of random spin system that interpolates between a ferromagnetic and a spin-glass regime as a physical parameter is tuned---the positions of ensembles within the cavity. Second, and more importantly, the spins relax via deterministic steepest-descent dynamics, rather than Glauber dynamics. We show that this nonequilibrium quantum-optical scheme has significant advantages for associative memory over Glauber dynamics: These dynamics can enhance the network's ability to store and recall memories beyond that of the standard Hopfield model. Surprisingly, the cavity QED dynamics can retrieve memories even when the system is in the spin glass phase. Thus, the experimental platform provides a novel physical instantiation of associative memories and spin glasses as well as provides an unusual form of relaxational dynamics that is conducive to memory recall even in regimes where it was thought to be impossible.


I. INTRODUCTION
Five hundred million years of vertebrate brain evolution have produced biological information-processing architectures so powerful that simply emulating them, in the form of artificial neural networks, has lead to breakthroughs in classical computing [1,2]. Indeed, neuromorphic computation currently achieves state-of-the-art performance in image and speech recognition, machine translation, and even out-performs the best humans in ancient games like Go [3]. Meanwhile, a revolution in our ability to control and harness the quantum world is promising technological breakthroughs for quantum information processing [4,5] and sensing [6]. Thus, combining the algorithmic principles of robust parallel neural computation, discovered by biological evolution, with the nontrivial quantum dynamics of interacting light and matter naturally offered to us by the physical world, may open up a new design space of quantum-opticsbased neural networks. Such networks could potentially achieve computational feats beyond anything biological or silicon-based machines could do alone.
We present an initial step along this path by theoretically showing how a network composed of atomic spins coupled by photons in a multimode cavity can naturally realize associative memory, which is a prototypical function of neural networks. Moreover, we find that including the effects of drive and dissipation, the naturally arising nonequilibrium dynamics of the cavity QED system enhances its ability to store and recall multiple memory patterns, even in a spin glass phase.
Despite the biologically inspired name, artificial neural networks can refer to any network of nonlinear elements (e.g., spins) whose state depends on signals (e.g., magnetic fields) received from other elements [7][8][9]. They provide a distributed computational architecture alternative to the sequential gate-based von Neumann model of computing widely used in everyday devices [10] and employed in traditional quantum computing schemes [4]. Rather than being programmed as a sequence of operations, the neural network connectivity encodes the problem to be solved as a cost function, and the solution corresponds to the final steady-state configuration obtained by minimizing this cost function through a nonlinear dynamical evolution of the individual elements. Specifically, the random, frustrated Ising spin glass is an archetypal mathematical setting for exploring neural networks [7]. Finding the ground state of an Ising spin glass is known to be an NP-hard problem [11,12] and so different choices of the spin connectivity may therefore encode many different combinatorial optimization problems of broad technological relevance [13,14]. Much of the excitement in modern technological and scientific computing revolves around developing faster, more efficient 'heuristic' optimization solvers that provide 'good-enough' solutions. Physical systems capable of realizing an Ising spin glass may play such a role. In this spirit, we present a thor-ough theoretical investigation of a quantum-optics-based heuristic neural-network optimization solver in the context of the associative memory problem.
Using notions from statistical mechanics, Hopfield showed how a spin glass of randomly connected Ising spins can be capable of associative memory, a prototypical brain-like neural network function [15]. Associative memory is able to store multiple patterns (memories) as local minima of an energy landscape. Moreover, recall of any individual memory is possible even if mistakes are made when addressing the memory to be recalled: if the network is initialized by an external stimulus in a state that is not too different from the stored memory, then the network dynamics will flow towards an internal representation of the original stored memory, using an energy minimizing process called pattern completion that corrects errors in the initial state. Such networks exhibit a trade-off between capacity (number of memories stored) and robustness (the size of the basins of attraction of each memory under pattern completion). Once too many memories are stored, the basins of attraction cease to be extensive, and the model transitions to a spinglass regime with exponentially many spurious memories (with subextensive basins of attraction) that are nowhere near the desired memories [16].
From a hardware perspective, most modern neural networks are implemented in CMOS devices based on electronic von-Neumann architectures. In contrast, early work aiming to use classical, optics-based spin representations sought to take advantage of the natural parallelism of light propagation [17][18][19]; such work continues in the context of silicon photonic integrated circuits and other coupled classical oscillator systems [20,21]. The use of atomic spins coupled via light promises additional advantages: atom-photon interactions can be strong even on the single-photon level [22], providing the ability to process small signals and exploit manifestly quantum effects and dynamics.
Previous theoretical work has sketched how a spin glass and neural network may be realized using ultracold atoms strongly coupled via photons confined within a multimode optical cavity [23,24]. The cavity modes serve as the synaptic connections of the network, mediating Ising couplings between atomic spins. The arrangement of atoms within the cavity determines the specific connection strengths of the network, which in turn determine the stored patterns. The atoms may be reproducibly trapped in a given connectivity configuration using optical tweezer arrays [25,26]. Subsequent studies have provided additional theory support to the notion that related quantum-optical systems can implement neural networks [27][28][29][30]. However, all these works, including Refs. [23,24], left significant aspects of implementation and capability unexplored.
In the present theoretical study, we introduce the first practicable scheme for a quantum-optical associative memory by explicitly treating photonic dissipation and ensuring that the physical system does indeed be- have similarly to a Hopfield neural network. All physical resources invoked in this treatment have been demonstrated in recent experiments [31][32][33][34][35][36][37]. Specifically, we show that suitable network connectivity is provided by optical photons in the confocal cavity, a type of degenerate multimode resonator [38]. The photons are scattered into the cavity from atoms pumped by a coherent field oriented transverse to the cavity axis, as illustrated in Fig. 1. The atoms undergo a transition to a superradiant state above a threshold in pump strength. In this state, the spins effectively realize a system of rigid (easy-axis) Ising spins with rapid spin evolution, ensuring that memory retrieval can take place before heating by spontaneous emission can play a detrimental role. We moreover find the cavity QED system naturally leads to a discrete analog of "steepest descent" dynamics, which we show provides an enhanced memory capacity compared to standard Glauber dynamics [39]. Finally, the spin configuration can be read-out by holographic imaging of the emitted cavity light, as recently demonstrated [35]. That is, the degenerate cavity provides high-numericalaperture spatial resolving capability, and may be construed as an active quantum gas microscope, in analogy to apparatuses employing high-NA lenses near optically trapped atoms [40].
Our main results are as follows: 1. Superradiant scattering of photons by ensembles of atomic spins plus cavity dissipation naturally realizes a form of zero-temperature dynamics in a physical setting: discrete steepest descent (SD) dynamics. This is because the bath structure dictates that large energy-lowering spin flips occur most rapidly. This is distinct from the typical zero temperature limit of Glauber [39] or Zero Temperature Metropolis-Hastings [41,42] (0TMH) dynamics typically considered in Hopfield neural networks [43].
2. The confocal cavity can naturally provide a dense (all-to-all) spin-spin connectivity that is tunable between (1) a ferromagnetic regime, (2) a regime with many large basins of attraction suitable for associative memory, and (3) a regime in which the connectivity describes a Sherrington-Kirkpatrick (SK) spin glass. This sequence of regimes is characteristic of Hopfield model behavior. 3. Surprisingly, standard limits on memory capacity and robustness are exceeded under SD dynamics. This enhancement is because SD dynamics enlarge the basins of attraction-i.e., 0TMH can lead to errant fixed points in regimes where SD always leads to the correct one. This is true not just of the cavity QED system, but also for the basic Hopfield model with Hebbian and other learning rules. Moreover, the enhancement persists into the SK spin glass regime, wherein basins of metastable states expand from zero size under 0TMH dynamics to an extensively scaling size under SD. 4. While simulating the SD dynamics requires O(N 2 ) numerical operations to determine the optimal energy-lowering spin flip, the physical cavity QED system naturally flips the optimal spin due to the different rates experienced by different spins. Thus, the real dynamics drives the spins to converge to a fixed point configuration (memory) more efficiently than numerical SD or 0TMH dynamics, assuming similar spin-flip rates. 5. We introduce a pattern storage method that allows one to program associative memories in the cavity QED system. The memory capacity of the cavity under this scheme can be as large as N patterns.
Encoding states requires only a linear transformation and a threshold operation on the input and output fields, which can be implemented in an optical setting via spatial light modulators. While the standard Hopfield model does not require encoding, it cannot naturally be realized in a multimode cavity QED system. Thus, the physical cavity QED system enjoys roughly an order-of-magnitude greater memory capacity at the expense of an encoder. 6. Overall, our storage and recall scheme points to a novel paradigm for memory systems in which new stimuli are remembered by translating them into already intrinsically stable emergent patterns of the native network dynamics.
The remainder of this paper is organized as follows. We first describe the physical confocal cavity QED (CCQED) system in Sec. II, before introducing the Hopfield model in Sec. III. Next, we analyze the regimes of spin-spin connectivity provided by the confocal cavity in Sec. IV. We then discuss in Sec. V the SD dynamics manifest in a transversely pumped confocal cavity above the superradiant transition threshold. Section VI discusses how SD dynamics enhances associative memory capacity and robustness. A learning rule that maps free-space light patterns into stored memory is presented in Sec. VII.
Last, in Sec. VIII, we conclude and frame our work in a wider context. In this discussion, we speculate about how the quantum dynamics of the superradiant transition might enhance solution finding. This is in contrast to all the rest of this paper, where we consider a semiclassical regime well above any quantum critical dynamics at the transition itself. Embedding neural networks in systems employing ion traps, optical lattices and optical-parametric-oscillators has been explored [44][45][46][47], and comparisons of the latter to our scheme is discussed here also. This section also provides concluding remarks regarding how the study of this physical system may provide new perspectives on the problem of how memory arises in biological systems [48].
Appendices A-F present the following: A, the Raman coupling scheme and effective Hamiltonian; B, the convolution of the confocal connectivity with the finite spatial extent of the spin ensembles; C, the derivation of the confocal connectivity probability distribution; D, the spin-flip dynamics in the presence of a classical bath with ohmic noise spectrum; E, the derivation of the spin-flip rate and dynamics in the presence of a quantum bath; and F, the derivation of the mean-field ensemble dynamics.

II. CONFOCAL CAVITY QED
As illustrated in Fig. 1, we consider a configuration of N spatially separated BECs placed in a confocal cavity. In a confocal cavity, the cavity length L is equal to the radius of the curvature of the mirrors R, which leads to degenerate optical modes [38]. More specifically, modes form degenerate families, where each family consists of a complete set of transverse electromagnetic TEM lm modes with l + m either of even or odd parity. Recent experiments have demonstrated coupling between compactly confined Bose-Einstein condensates (BECs) of ultracold 87 Rb atoms and a high-finesse, multimode (confocal) optical cavity with L = R = 1 cm [31][32][33][34][35][36].
The coupling between the BECs and the cavity occurs via a double Raman pumping scheme illustrated in Fig. 2. The |F, m F = |1, −1 and |2, −2 states are coupled via two-photon processes, involving pump lasers oriented transverse to the cavity axis and the cavity fields [49]. The motion of atoms may be suppressed by introducing a deep 3D optical lattice into the cavity [50]. When atoms are thus trapped, each BEC can be described as an ensemble of pseudospin-1/2 particles-corresponding to the two states discussed above-with no motional degrees of freedom. The system then realizes a nonequilibrium Hepp-Lieb-Dicke model [51], exhibiting a superradiant phase transition [32,33] when the pump laser intensity FIG. 2. Double Raman atomic coupling. The scheme provides a two-level system, corresponding to the two hyperfine states of 87 Rb. Two transversely oriented pump beams are used to generate cavity-assisted two-photon processes coupling the states |F, mF = |2, −2 ≡ |↑ and |F, mF = |1, −1 ≡ |↓ .
is sufficiently large.
In the following, we will assume parameter values similar to those realized in the CCQED experiments of Refs. [31][32][33][34][35][36]: Specifically, we take a single-atom-tocavity coupling rate of g 0 = 2π × 1.5 MHz [52], a cavity field decay rate of κ = 2π × 150 kHz, a pump-cavity detuning of ∆ C = −2π × 3 MHz, and a detuning from the atomic transition of ∆ A = −2π × 100 GHz. The spontaneous emission rate is approximately ΓΩ 2 /∆ 2 A , where Γ = 2π × 6.065 (9) MHz is the linewidth of the D 2 transition in 87 Rb and Ω 2 is proportional to the intensity of the transverse pump laser. A large ∆ A ensures that the spontaneous emission rate is far slower than the inverse lifetime Γ of the Rb excited state. A typical value of Ω 2 is set by the pump strength required to enter the superradiant regime; with 10 7 atoms [53], Ω 2 can be low enough to achieve a spontaneous decay timescale on the order of 100 ms. Hereafter, we will not explicitly include spontaneous emission, but will consider it to set an upper time limit on the duration of experiments.
To control the position of the BECs, one may use optical tweezer arrays [25,26]. Experiments have already demonstrated the simultaneous trapping of several ensembles in the confocal cavity using such an approach [33]. Extending to hundreds of ensembles is within the capabilities of tweezer array technology [26]. As we discuss in Sec. V C, collective enhancement of the dynamical spin flip rate occurs, depending on the number of atoms in each ensemble. This enhancement is needed so that the pseudospin dynamics is faster than the spontaneous emission timescale. Only a few thousand atoms per ensemble are needed to reach this limit, while the maximum number of ultracold atoms in the cavity can reach 10 7 . Thus, current laser cooling and cavity QED technology provides the ability to support roughly 10 3 network nodes and have them evolve for a few decades in timescale. This number of nodes is similar to state-ofthe-art classical spin glass numerical simulation [54]. Improvements to cavity technology can allow the size of the spin ensembles to shrink further, ultimately reaching the single-atom level. Moreover, Raman cooling [55] within the tight tweezer traps can help mitigate heating effects, allowing atoms to be confined for up to 10 s, limited only by the background gas collisions in the vacuum chamber. By shrinking the size of the ensembles, it may be possible for quantum entanglement among the nodes to then persist deep into the superradiant regime-we return to this possibility in our concluding discussion in Sec. VIII.
The confocal cavity realizes a photon-mediated interaction among the spin ensembles. This interaction is described by a matrix J ij , denoting the coupling between ensembles i and j. As derived in Sec. V, this interaction involves a sum over all relevant cavity modes, m, and takes a form J ij = − m ∆ m g im g jm /(∆ 2 m + κ 2 ). Here, g im is the coupling between cavity mode m and ensemble i-which depends on the positions of atoms and spatial profiles of modes-while ∆ m is the detuning of the pump laser from cavity mode m. For simplicity in writing this equation, we will restrict the BEC positions r i to lie within the transverse plane at the center of the cavity. Following refs. [33,35,36], in the confocal limit ∆ m = ∆ C , the interaction then takes the form: Here,g 0 = Ωg 0 /∆ A denotes an effective coupling strength in terms of the transverse pump strength Ω, single-atom-to-cavity coupling g 0 and atomic detuning ∆ A . The length scale w 0 is the width (radius) of the Gaussian TEM 00 mode. The term β is a geometric factor determined by the shape of the BEC. For a Gaussian atomic profile of width σ A in the transverse plane, β = w 2 0 /8σ 2 A , which is typically ∼10. The first term βδ ij is a local interaction that is present only for spins within the same ensemble. This term arises from the light in a confocal cavity being perfectly refocused after two round trips. The effect of this term is to align spins within the same ensemble, or in other words, to induce the superradiant phase transition of that ensemble. In practice, imperfect mode degeneracy broadens the refocussing into a local interaction of finite range; Refs. [33,35,36] discuss this effect. The range of this interaction is controlled by the ratio between the pump-cavity detuning ∆ C and the spread of the cavity mode frequencies. At sufficiently large ∆ C , the interaction range can become much smaller than the spacing between BECs. Because a confocal cavity resonance contains only odd or even modes, there is also refocussing at the mirror image position; we can ignore this by assuming all BECs are in the same half-plane [33,35,36].
The nonlocal second term arises from the propagation of light in between the refocussing points. Intuitively, this interaction arises from the fact that each confocal cavity images the Fourier transform of objects in the central plane back onto the objects in that same plane. Thus, Individual spins align within an ensemble due to superradiance, while ensembles organize with respect to each other due to Jij coupling. Cavity emission allows for holographic reconstruction of the spin state: red and blue fields are π out of phase, allowing discrimination between up and down spin ensembles. (b) The spin ensembles realize a Hopfield neural network: a single layer network of binary neurons si = ±1 that are recurrently fed back then subjected to a linear transform J and threshold operation at each neuron. (c) The Hopfield model exhibits an energy landscape with many metastable states. Each local minimum encodes a memory spin configuration (pattern) surrounded by a larger basin of attraction of similar spin states. Energy minimizing dynamics drive sufficiently similar spin configurations to the stored local minimum. There is a phase transition from the associative memory to a spin glass once there are too many memories; i.e., when so many minima exist that basins of attraction vanish. (d) Schematic of the associative memory problem. Multiple stored patterns (e.g., images of element symbols) may be recalled by pattern completion of distorted input images.
photons scattered by atoms in local wavepackets are reflected back onto the atoms as delocalized wavepackets with cosine modulation-the Fourier transform of a spot. Formally, it arises due to Gouy phase shifts between the different degenerate modes, see Refs. [33,35,36] for a derivation and experimental demonstrations. This interaction is both nonlocal and nontranslation invariant. It can generate frustration between spin ensembles due to its sign-changing nature, as discussed in Sec. IV below. The structure of the matrix J ij is quite different from those appearing traditionally in Hopfield models, either under Hebbian or pseudoinverse coupling rules, as discussed in Sec. IV. We note that while the finite spatial extent of the BEC is important for rendering the local interaction finite, we show in Appendix B that it does not significantly modify the nonlocal interaction in the experimental regime discussed here.

III. HOPFIELD MODEL OF ASSOCIATIVE MEMORY
The Hopfield associative memory is a model neural network that can store memories in the form of distributed patterns of neural activity [43,56,57]. In the simplest instantiation of this class of networks, each neuron i has an activity level s i that can take one of two values: +1, corresponding to an active neuron, or −1, corresponding to an inactive one. The entire state of a network of N neurons is then specified by one of 2 N possible distributed activity patterns. This state evolves according to the discrete time dynamics where J ij is a real valued number that can be thought of as the strength of a synaptic connection from neuron j to neuron i. Intuitively, this dynamics computes a total input h eff i = j =i J ij s j (t) to each neuron i and compares it to a threshold h i . If the total input is greater (less) than this threshold, then neuron i at the next timestep is active (inactive). One could implement this dynamics in parallel, in which case Eq. (2) is applied to all N neurons simultaneously. Alternatively, for reasons discussed below, it is common to implement a serial version of this dynamics in which a neuron i is selected at random, and then Eq. (2) is applied to that neuron alone, before another neuron is chosen at random to update.
The nature of the dynamics in Eq. (2) depends crucially on the structure of the synaptic connectivity matrix J ij . For arbitrary J ij , and large system sizes N , the long-time asymptotic behavior of s i (t) could exhibit three possibilities: (1) flow to a fixed point; (2) flow to a limit cycle; or (3) exhibit chaotic evolution. On the other hand, with a symmetry constraint in which J ij = J ji , the serial version of the dynamics in Eq. (2) monotonically decreases an energy function In particular, under the update in Eq. (2) for a single spin, it is straightforward to see that H(t+1) ≤ H(t) with equality if and only if s i (t + 1) = s i (t). Indeed, the serial version of the update in Eq. (2) corresponds exactly to Zero Temperature Metropolis-Hastings (0TMH) [41,42] or Glauber dynamics [39] applied to the energy function in (3). The existence of a monotonically decreasing energy function rules out the possibility of limit cycles, and every neural activity pattern thus flows to a fixed point, which corresponds to a local minimum of the energy function. A local minimum is by definition a neural activity pattern in which flipping any neuron's activity state would increase the energy. One of Hopfield's key insights was that we could think of neural memories as fixed points or local minima in an energy landscape over the space of neural activity patterns; these are also sometimes known as metastable states or attractors of the dynamics. Each such fixed point has a basin of attraction, corresponding to the set of neural activity patterns that flow under Eq. (2) to that fixed point. The process of successful memory retrieval can then be thought of in terms of a pattern completion process. In particular, an external stimulus may initialize the neural network with a neural activity pattern corresponding to a corrupted or partial version of the fixed point memory. Then, as long as this corrupted version still lies within the basin of attraction of the fixed point, the flow towards the fixed point completes or cleans up the initial corrupted pattern, triggering full memory recall. This is an example of content addressable associative memory, where partial content of the desired memory can trigger recall of all facts associated with that partial content. A classic example might be recalling a friend who has gotten a haircut. Figure 3 illustrates this pattern completion based memory retrieval process.
In this framework, the set of stored memories, or fixed points, are encoded entirely in the connectivity matrix J ij ; for simplicity, we set the thresholds h i = 0. Therefore, if we wish to store a prescribed set of P memory patterns ξ µ = (ξ µ 1 , . . . , ξ µ N ) for µ = 1, . . . , P , where each ξ µ i = ±1, we need a learning rule for converting a set of given memories {ξ µ } P µ=1 into a connectivity matrix J. Ideally, this connectivity matrix should instantiate fixed points under the dynamics in Eq. (2) that are close to the desired memories ξ µ , with large basins of attraction, enabling robust pattern completion of partial, corrupted inputs. Of course, in any learning rule, one should expect a trade-off between capacity (the number of memories that can be stored) and robustness (the size of the basin of attraction of each memory, which is related to the fraction of errors that can be reliability corrected in a pattern completion process).
When the desired memories ξ µ are unstructured and random, a common choice is the Hopfield connectivity, which corresponds to a Hebbian learning rule [43,57]: We may note that in the magnetism literature, such a model is known as the multicomponent Mattis model [58]. The properties of the energy landscape associated with the dynamics in Eq. (2) under this connectivity have been analyzed extensively in the thermodynamic limit N → ∞ [59,60]. When P < ∼ 0.05N the lowest energy minima are in one to one correspondence with the P desired memories. For 0.05 < ∼ P/N < ∼ 0.138, the P memories correspond to metastable local minima. However, pattern completion is still possible; an initial pattern corresponding to a corrupted memory, with a small but extensive number of errors proportional to N , will still flow towards the desired memory. However, at larger P , the network undergoes a spin-glass transition in which there can be exponentially many energy minima with small basins of attraction, none of which are close to the desired memories. Thus, pattern completion is not possible: an initial pattern corresponding to a corrupted memory, with even a small but extensive number of errors proportional to N , is not guaranteed to flow towards the desired memory. This spin glass transition occurs because the large number of memories start to interfere with each other. In essence, the addition of each new memory modifies the existing local minima associated with previous memories. When too many memories are stored, it is not possible, under the Hebbian rule in Eq. (4) and the dynamics of Eq. (2), to ensure the existence of local minima, with large basins, close to any desired memory.
Thus, we have noted the Hopfield model undergoes a spin glass transition above a critical pattern loading P/N ≈ 0.138. In the exposition below, it will be useful to also consider a prototypical example of a spin glass, namely the Sherrington-Kirkpatrick (SK) model [61]. In the SK model, the matrix elements of the symmetric matrix J SK are chosen i.i.d. from a zero mean Gaussian distribution with variance σ 2 /N . At low temperature, such a model also has a spin-glass phase with exponentially many energy minima, and as we shall see below, our CCQED system exhibits a transition from a memory retrieval phase to an SK-like spin-glass phase as the positions of spin ensembles spread out within the cavity.
Numerous improvements to the Hebbian learning rule have been introduced [62,63] that sacrifice the simple outer product structure of the Hebbian connectivity in Eq. (4) for improved capacity. Notable among them is the pseudoinverse rule, in which P may be as large as N . This large capacity comes at the cost of being a nonlocal learning rule: updating any of the weights requires full knowledge of all existing J ij weights, unlike the Hebbian learning rule. The J ij matrix for the pseudoinverse learning rule is given by: where the matrix C µν = 1 N ξ µ · ξ ν stores the inner products of the patterns. This learning rule ensures that the desired memories ξ ν become eigenvectors of the learned connectivity matrix J Pseudo with eigenvalue 1, thereby ensuring that each desired memory corresponds to a fixed point, or equivalently a local energy minimum, under the dynamics in Eq. (2). While the pseudoinverse rule does guarantee each desired memory will be a local energy minimum, further analysis is required to check whether such minima have large basins. The basin size will generically depend on the structure of C µν , with potentially small basins arising for pairs of memories that are very similar to each other. Finally, we note that the Hebbian learning rule in Eq. (4) is in fact a special case of the pseudoinverse learning rule in Eq. (5) when the patterns ξ µ are all mutually orthogonal, with C µν = δ µν . We include the pseudoinverse rule in our comparisons below to demonstrate the generality of results we present, and to apply them to something known to surpass the original Hebbian scheme.
While the simple Hebbian rule and the more powerful pseudoinverse rule are hard to directly realize in a confocal cavity, we show in Sec. IV that the connectivity naturally provided by the confocal cavity is sufficiently high rank to support a multitude of local minima. We further analyze the dynamics of the cavity in Sec. V, and demonstrate that this dynamics endows this multitude of local minima with large basins of attraction in Sec. VI. Thus, the confocal cavity provides a physical substrate for high capacity, robust memory retrieval. Of course, the desired memories we wish to store may not coincide with the naturally occurring emergent local minima of the confocal cavity. However, any such mismatch can be solved by mapping the desired memories we wish to store into the patterns that are naturally stored by the cavity (and vice-versa). We will show in Sec. VII that such a mapping is possible, and further, that it is practicable using optical devices.
Taken together, the next few sections demonstrate that the CCQED system possesses three critical desiderata of an associative memory: (1) high memory capacity due to the presence of many local energy minima (Sec. IV); (2) robust memory retrieval with pattern completion of an extensive number of initial errors (Sec. V and VI); and (3) programmability or content addressability of any desired memory patterns (Sec. VII).

IV. CONNECTIVITY REGIMES IN A CONFOCAL CAVITY
As noted at the end of Sec. II, the form of connectivity J ij in Eq. (1), which arises naturally for the confocal cavity, is quite different from both the forms arising in the Hebbian (Eq. (4)) and pseudoinverse (Eq. (5)) learning rules discussed in the previous section. Therefore, in this section we analyze the statistical properties and energy landscape of the novel random connectivity matrix arising from the confocal connectivity, taking the ensemble positions r i to be randomly chosen from a Gaussian dis-tribution in the cavity transverse midplane. Specifically, we choose a distribution of positions with zero mean (i.e., centered about the cavity axis) and tunable standard deviation w. As the interaction in Eq. (1) is symmetric under r i → −r i , the distribution may also be restricted to a single half plane to avoid the mirror interaction with no adverse affect to local and nonlocal interactions.
In the limit w → 0, so that all r i = 0, all off-diagonal elements of the confocal connectivity matrix J ij become identical and positive. This describes a ferromagnetic coupling. As the width increases, some elements of J ij become negative, as illustrated in Fig. 4. This can lead to frustration in the network of coupled spins, where the product of couplings around a closed loop is negative, i.e., where J ij J jk J ki < 0 for spins i, j, and k. In such cases, it is not possible to minimize the energy of all pairwise interactions simultaneously. Such frustration can in principle lead to a proliferation of metastable states, a key prerequisite for the construction of an associative memory. As we shall see, tuning the width w allows one to tune the degree of frustration, and thus the number of metastable states in the confocal cavity. In the remainder of this section, we will consider only the normalized nonlocal couplingsJ ij = cos 2r i · r j /w 2 0 ∈ [−1, 1]. The local part of the interaction plays no role in the properties we discuss in this section, but does affect the dynamics as discussed in Sec. V.
To characterize the statistical properties of the random matrix J ij , and its dependence on w, we begin by considering the marginal probability distribution for individual matrix elements J ij . Details of the derivation of this distribution are given in Appendix C; the result is wherew ≡ w/w 0 . Figure 4(a-c) illustrates the evolution of this marginal distribution for increasing width. For smallw, the distribution is tightly peaked around J ij = +1, corresponding to an unfrustrated all-to-all ferromagnetic model, with only a single global minimum (up to Z 2 symmetry). Asw increases, negative (antiferromagnet) elements ofJ ij become increasingly probable. Also plotted in Fig. 4(d) are the fractions of J ij links that are antiferromagnetic as well as the fraction that realize frustrated triples of spin connectivity, J ij J jk J ki < 0. The probability of antiferromagnetic coupling is analytically calculated in Appendix C, while the probability of frustrated triples is evaluated numerically. If the different matrix elements J ij were uncorrelated, then one could anticipate that, as in the SK model, frustration would occur once the probability of negative J ij becomes sufficiently large. However, when correlations exist, the presence of many negative elements is not sufficient to guarantee significant levels of frustration and the consequent proliferation of metastable local energy minima. For example, the rank 1 connectivity J ij = ξ i ξ j , for a random vector ξ, can have have an equal fraction of positive and negative elements while remaining unfrustrated [58]. In general, we expect the couplings J ij and J jk should be correlated, as they both depend on the common position r j . As discussed in App. C, this correlation can be computed analytically as a function of the width: where · r denotes an average over realizations of the random placement of spin ensembles. Although correlations exist, we see from this expression that they decay like 1/w 4 , so that at largew, the correlations are weak; see Figure 4(d). Of course, even weak correlations in a large number of O(N 2 ) off-diagonal elements can, in principle, dramatically modify important emergent properties of the random matrix, such as the induced multiplicity of metastable states and the statistical structure of the eigenvalue spectrum. Thus, we examine the properties of the actual correlated random matrix ensemble arising from the confocal cavity rather than adopt known results. Figure 5 shows a numerical estimation of the number of metastable states as a function of the width w. This number is estimated by initializing a large number of random initial states and allowing those states to relax via 0TMH dynamics until a metastable local energy minimum is found. This routine is performed for many realizations of the connectivity J ij , then averaged over realizations to produce the number plotted in Fig. 5. We regard configurations which are related by an overall spin flip as equivalent. A single metastable state is found at smallw, as expected for ferromagnetic coupling. A transition to multiple metastable states occurs asw increases; this transition becomes increasingly sharp at larger system sizes. Finite size scaling analysis of the transition yields a critical point of w AM ≈ 0.67w 0 . Only one minima exists below this value, while multiple minima emerge above. The number of metastable states, shown in Fig. 5(b), increases rapidly for w > w AM . In particular, in the range of w and N that we explored, we find that the following fits the simulations: N = √ 1 + Ae Bx , where x = N 1/ν (w − w AM )/w 0 is the rescaled width, and A = 0.33, B = 3.4, and ν = 2.4 are the fit parameters. Thus, the number of metastable states scales with N and w as O(e wN 0.4 ) just above w AM . At still larger w, the numerical estimation of this number becomes less reliable due to the increasing prevalence of metastable states with small basins of attraction under 0TMH dynamics [64].
The existence of multiple (metastable) local energy minima is a critical prerequisite for associative memory storage. An additional requirement, as discussed in Sec. III, is that these local energy minima should possess large enough basins of attraction to enable robust pattern completion of partial or corrupted initial states, through the intrinsic dynamics of the CCQED system.
As of yet, we have made no statement about the basins of attraction of the metastable states we have found; we will examine this issue in the next two sections which will focus on the CCQED dynamics. For now, we simply note that at a fixed width ofw an O(1) amount above FIG. 5. (a) Number of metastable states versus distribution width for various system sizes N , as indicated. The average number of metastable states increases with N above the ferromagnetic-to-associative memory transition at wAM = 0.67(1)w0. (b) Scaling collapse of the above in the region 0.5w0 < w < 0.8w0, with denser sampling. The x-axis is rescaled as N 1/ν (w − wAM)/w0, while the y-axis is unchanged. The parameters ν and wAM are determined by fitting the collapsed data to an exponential form the transition, the number of metastable states N grows with system size N as O(e N 0.4 ), while the total configuration space grows with exponential scaling 2 N . Given that every configuration must flow to one of these energy minima, it seems reasonable to expect that any energy minimizing dynamics should endow these minima with sufficiently large basins of attraction to enable robust pattern completion, assuming these basins are all approximately similar in size.
At fixed N , the growth in the number of energy minima as a function ofw, depicted in Fig. 5, suggests the possibility of a spin glass phase wherein exponentially many metastable local energy minima emerge. Based on the analysis of the marginal Eq. (6) and pairwise Eq. (7) statistics of the matrix elements, we expect that the spin glass phase should be like that of an SK model at largew.
To determine if such a state exists, we further analyze the connectivity in this largew regime by comparing properties of the CCQED connectivity to those of the SK spin glass connectivity. In the limit of largew, the probabil-ity density for the J ij given in Eq. (6) takes the limiting form p(J ij ) = (1 −J 2 ij ) −1/2 /π. This functional form differs from the SK spin glass model in which the probability density of the couplings is Gaussian, with only the first two moments nonzero. However, it is known [65] that the SK model free energy depends on only the first two moments of the marginal distribution in the thermodynamic limit, as long as the third-order cumulant of the distribution is bounded. This is also true for the CC-QED connectivity. Moreover, these first two moments in the CCQED connectivity can be computed analytically. The mean µ J and standard deviation σ J as a function of width are Thus, at large width, the mean is negligible compared to the standard deviation, which is required for a spin glass (as opposed to ferromagnetic state).
However, a key difference between the CCQED connectivity and the SK connectivity is the presence of correlations between different matrix elements. The former's correlation strength decreases with width, see Eq. (7) and Fig. 4(d). We can obtain insights into how large the width must be in order to suppress these correlations, thereby crossing into an SK-like spin glass phase, by comparing the statistical structure of the CCQED connectivity eigenvalue distribution to that of the SK model connectivity. In particular, the eigenvalue distribution obeys the same Wigner's semicircular law f Wigner (x) = √ 4 − x 2 /2π as does the SK connectivity with zero mean i.i.d. Gaussian elements of variance 1/N [66,67]. Moreover, the distribution of spacings s between adjacent eigenvalues (normalized by the mean distance) in both obey Wigner's surmise p Wigner (s) = πs 2 e −πs 2 /4 , reflecting repulsion between adjacent eigenvalues [68]. Figure 6(a,b) plots the eigenvalue distribution f w (x) and level-spacing distribution p w (s) for several widths w for the CCQED connectivity. Both distributions approach those of the SK model for widths beyond a few w 0 . As we discuss next, the required ratio w/w 0 to reach the SK regime depends on the system size N . Figure 6(c) plots the difference between the confocal eigenvalue distribution, denoted by f w , and the Wigner semicircular law f Wigner in terms of the Hellinger distance, which is defined for arbitrary probability distributions p(x) and q(x) as H 2 (p, q) = dx( p(x) − q(x)) 2 /2. The distance metric equals 1 for completely non-overlapping distributions and 0 for identical distributions. We find that the Hellinger distance follows a universal (N independent) curve as a function ofwN −1/4 . The structure of this curve demonstrates that as long as w > w SK = αN 1/4 w 0 , where α is a constant of order unity, then the spectrum of the confocal cavity connectivity assumes Wigner's semicircular law, just like that of the SK model connectivity. At this large value of w, the strength of correlations between different matrix elements in the confocal cavity, given by Eq. (7), is O(1/N ). Regimes of CCQED connectivity behavior versus width w and system size N . The three shaded regions correspond to ferromagnetic (F), associative memory, and SK-like spin glass (SG) regimes, as discussed in the text. Note that associative memories can still be encoded even in the spin glass regime due to SD dynamics; see Secs. V and VII.
Such weak correlations, combined with an O(1) variance and a negligible mean of the matrix elements, endow the CCQED connectivity with similar spectral properties to that of the SK connectivity. Given these similarities, we thus expect that the CCQED model possesses an SK-like spin glass phase at widths w > w SK . While the Hellinger distance in Fig. 6(c) falls to zero, it does so smoothly as N 1/4 ; whether a phase transition or crossover occurs between the associative memory and spin glass behaviors is unclear and warrants future investigation. Figure 7 summarizes the three regimes of the CCQED connectivity matrix. The evolution from ferromagnetic, to associative memory, to spin glass versus w is analogous to the behavior exhibited by the Hopfield model as the ratio P/N of the number of memory patterns to neurons increases in the Hebbian connectivity. Having demonstrated that a significant number of metastable states exist in the cavity QED system, Sec. V will now discuss the natural dynamics of the cavity. We will show that these differ from standard 0TMH or Glauber dynamics. As mentioned, changing the dynamics changes the basins of attraction associated with each metastable state. Remarkably, as we will show in Sec. VI, the CCQED dynamics leads to a dramatic increase in the robustness of the associative memory by ensuring that many metastable states with sufficiently large basins for robust pattern completion exist, even at largew.

V. SPIN DYNAMICS OF THE SUPERRADIANT CAVITY QED SYSTEM
We now discuss the spin dynamics arising intrinsically for atoms pumped within an optical cavity. To do this, we start from a full model of coupled photons and spins, and show how both the connectivity matrix discussed above and the natural dynamics emerge. Aspects of these have been presented in earlier work; e.g., the idea of deriving an associative memory model from coupled spins and photons was discussed in Refs. [23,24], with spin dynamics discussed in Ref. [30]. Because-as we discuss in Sec. VI-the precise form of the open-system dynamics is crucial to the possibility of memory recovery, we include a full discussion of those dynamics here to make this paper self contained.
Our discussion of the spin dynamics proceeds in several steps. Here, we start from a model of atomic spins coupled to cavity photon modes; this is derived in Appendix A. We then discuss how to adiabatically eliminate the cavity modes in the regime of strong pumping and in the presence of dephasing, leading to a master equation for only the atomic spins. This equation describes rates of processes in which a single atomic spin flips, and these rates include a superradiant enhancement, dependent on the state of other spins in the same ensemble. Such dynamics can be simulated stochastically. Finally, we show how, for large enough ensembles, a deterministic equation for the average magnetization of the ensemble can be derived. This equation is shown to describe a discrete form of steepest descent dynamics.
The system dynamics is given by the master equation: where a m is the annihilation operator for the mth cavity mode and the Lindblad superoperator is L[X] = 2XρX † − {X † X, ρ}. The dissipative terms describe cavity loss at a rate κ [69]. This is the only dissipative process we consider, as we assume the pump laser is sufficiently far detuned from the atomic excited state that we may ignore spontaneous emission. As noted earlier, spontaneous emission would lead to heating, which sets a maximum duration of the experiment. As derived in Appendix A, the Hamiltonian takes the form of a multimode generalization of the Hepp-Lieb-Dicke model [70][71][72][73]: The first two terms describe the cavity alone. We work in the rotating frame of the transverse pump: ∆ m denotes the detuning of the transverse pump from the mth cavity mode. We will assume that the transverse pump is red detuned, and so ∆ m < 0. We also include a longitudinal pumping term, written in the transverse mode basis as ϕ m . Such a pump allows one to input memory patterns (possibly corrupted) into the cavity QED system. To describe the atomic ensembles, we introduce collective spin operators the Pauli operators for the individual spins within the ith localized ensemble, which contains M spins. The term ω z S z i describes the bare level splitting between the atomic spin states. The coupling between photons and spins is denoted g im = Ξ m (r i )Ωg 0 cos(φ m )/∆ A . This expression involves the transverse profile Ξ m (r) of the mth cavity mode and the effect from the Gouy phase cos(φ m ); see Appendix A. For a confocal cavity, these are Hermite-Gaussian modes; their properties are extensively discussed elsewhere [33,35,36,38].
The final term in Eq. (10) introduces a classical noise source χ(t). In Appendix D, we show this coupling generates dephasing in the S x subspace so as to restrict the dynamics to classical states, simplifying the dynamics. Such noise may arise naturally from noise in the Raman lasers. In addition, such a term can also be deliberately enhanced either through increasing such noise, or by introducing a microwave noise source oscillating around ω z . We choose to consider the dynamics with such a noise term to enable us to draw comparisons to other classical associative memories. Without such a term, understanding the spin-flip dynamics would be far more complicated, as it would require a much larger state space, with arbitrary quantum states of the spins. Exploring this quantum dynamics, when this noise is suppressed, is a topic for future work, as we discuss in Sec. VIII.
This multimode, multi-ensemble generalization of the open Dicke model exhibits a normal-to-superradiant phase transition, similar to that known for the singlemode Dicke model. The effects of multiple cavity modes have been considered for a smooth distribution of atoms [74,75], where it was shown that beyondmean-field physics can change the nature of the transition. In contrast, in Eq. (10) we consider ensembles that are small compared to the length scale associated with the cavity field resolving power [76], so all atoms in an ensemble act identically. As such, in the absence of inter-ensemble coupling, the normal-to-superradiant phase transition occurs independently for each ensemble of M atoms at the mean-field point g i,eff = g c , where 4M g 2 c = ω z (∆ 2 C + κ 2 )/∆ C , and g i,eff is the effective coupling for ensemble i to a cavity supermode (a superposition of modes coupled by the dielectric response of the localized atomic ensemble) [32]. The normal phase is characterized by S x = 0 and a rate ∝ M of coherent scattering of pump photons into the cavity modes. In the superradiant phase, S x = 0, and the atoms coherently scatter the pump field at an enhanced rate ∝ M 2 . This is experimentally observable via a macroscopic emission of photons from the cavity and a Z 2 symmetry breaking reflected in the phase of the light (0 or π) [31,37,77].
In the absence of coupling, each ensemble would independently choose how to break the Z 2 symmetry, i.e., whether to point up or down. Photon exchange among the ensembles couples their relative spin orientation and modifies the threshold. When all N ensembles are phaselocked in this way, the coherent photon scattering into the cavity in the superradiant phase becomes ∝ (N M ) 2 , in contrast to ∝ N M in the normal phase. Throughout this paper, our focus will be on understanding the effects of photon exchange, when the system is pumped with sufficient strength that all the ensembles are already deep into the superradiant regime. The behavior near threshold, and shifts to the threshold due to inter-ensemble interactions, are discussed again in Sec. VIII. We now describe how to consider the collective spin ensemble dynamics deep in the superradiant regime.
To obtain an atom-only description deep in the superradiant regime, it is useful to displace the photon operators by their mean-field expectations for a given spin state: . This can be done via a Lang-Firsov polaron transformation [78,79], as defined by the unitary operator: We define polaron displacement operators as and S ± i = S y i ± iS z i are the (ensemble) raising and lowering operators in the S x basis. The Hopfield Hamiltonian emerges naturally from this transform: with the connectivity and longitudinal field given by: With the Hamiltonian in the form of Eq. (12), we can now adiabatically eliminate the cavity modes by treating the term proportional to ω z perturbatively via the Bloch-Redfield procedure. As derived in Appendix D, this yields the atomic spin-only master equatioṅ In the above expression, H eff is an effective Hamiltonian including both H Hopfield and a Lamb shift contribution [80], K i (δ ) is a rate function discussed below, and δ + i (δ − i ) are the changes in energy of H Hopfield after increasing (decreasing) the x component of collective spin in the ith ensemble. Note that although this expression involves ensemble raising and lowering operations, the master equation describes processes where the spin increases or decreases by one unit at a time. For brevity, we refer to these processes as spin flips below. Because these are collective spin operators, there will be a "superradiant enhancement" of these rates, as we discuss below in Sec. V B.
As mentioned above, classical noise dephases the quantum state into the S x subspace in which each ensemble exists in an S x i eigenstate. By doing so, the state may be described by the vector (s Since H eff commutes with S x , it generates no dynamics in this subspace. The dynamics thus arises solely through the dissipative Lindblad terms, corresponding to incoherent S x spin-flip events.
The energy difference, upon changing the spin of the ith ensemble by one unit ±1, can be explicitly written as For the terms with j = i, these represent the usual spin-flip energy in a Hopfield model. An additional selfinteraction term J ii (∓2s x i +1) arises from the energy cost of changing the overall spin of the ith ensemble. The self-interaction J ii thus provides a cost for the ith spin to deviate from s x i = ±S i , where S i = M/2 is the modulus of spin of ensemble i. As written in Eq. (1), J ii is enhanced by a term β, dependent on the size of the atomic ensembles. If sufficiently large, this could freeze all ensembles in place, by making any configuration with all s x i = ±S i a local minimum. However, the strength of J ii can be reduced by tuning slightly away from confocality, which smears-out the local interaction [33]. We will see below that for the realistic parameters employed in Sec. II, no such freezing is observed. It is important to note that the largest self-interaction energy cost occurs at the first spin-flip of a given ensemble-i.e., subsequent spin flips become easier not harder. One may also note that as the size of ensemble M increases, the interaction strength ratio of the self-interaction to the interaction from other ensembles is not affected; Eq. (17) shows all terms increase linearly with ensemble size.
As derived in Appendix D, the functions that then determine the rates of spin transitions take the form: where the function C(τ ) depends on the correlations of the classical noise source χ(t). It does so via J c (ω), the Fourier transform of χ(t)χ(0) , using Details of the derivation of this expression are given in Appendix D, along with explicit calculations for an ohmic noise source.
A. Spin-flip rates in a far-detuned confocal cavity We now show how the expressions for the spin-flip rate K(δ ) simplify for a degenerate, far-detuned confocal cavity. In an ideal confocal cavity, degenerate families of modes exist with all modes in a given family having the same parity. Considering a pump near to resonance with one such family, we may restrict the mode summation to that family, and take ∆ m = ∆ C for all modes m. In this case, the sum over modes in Eq. (18) simplifies, and becomes proportional to J ii . If we further assume ∆ C J ii -the far-detuned regime-we can Taylor expand the exponential in Eq. (18) to obtain the simpler spin-flip rate function where h(δ ) is a sharply peaked function centered on δ = 0. Its precise form depends on the spectral density of the noise source and is given explicitly in Appendix D. Classical noise broadens h(δ ) into a finitewidth peak. Considering experimentally realistic parameters, this width is at least an order-of-magnitude narrower than the range of typical spin-flip energies. As a result, its presence does not significantly affect the dynamics. The main contribution to the spin-flip rate thus comes from the second term, a Lorentzian centered at δ = ∆ C . Figure 8(a) plots the spin-flip rate of Eq. (20). By choosing negative ∆ C -i.e., red detuning-the negative offset of the Lorentzian peak from δ = 0 ensures that energy-lowering spin-flips occur at a higher rate than energy-raising spin-flips, thereby generating cooling dynamics. We can further define an effective temperature for the dynamics for spin-flip energies sufficiently small in magnitude. To show this, we inspect the ratio of energy-lowering to energy-raising spin-flip rates K(δ )/K(−δ ). To obey detailed balance-as would occur if coupling to a thermal bath-this ratio must be of the form exp(−δ /T ), where T is the temperature of the bath. For small |δ |, this means one should compare the rate ratio to a linear fit: To determine whether this temperature is large or small, it should be compared to a typical spin-flip energy δ : the system is hot (cold) when the ratio T eff /δ is much greater (less) than unity. The ratio of spin flip rates is shown in Fig. 8(b), along with the Boltzmann factor using T eff defined in Eq. (21). The rate ratio shows a small kink close to |δ | = 0 arising from the noise term, but otherwise closely matches the exponential decay up until |δ | |∆ C |. We may ensure the spin-flip energies are ≤ |∆ C | by choosing a pump strength Ω ∝ ∆ C / √ M . The spin dynamics drive the system toward a thermallike state in this regime.
Despite the presence of an effective thermal bath, the dynamics arising from the confocal cavity are rather different from those of the standard finite-temperature Glauber or Metropolis-Hastings dynamics. This can be seen by comparing the functions K(δ ) that would correspond to these dynamics. Glauber dynamics implements a rate function of the form while for Metropolis-Hastings, The zero temperature limits of both these functions are step functions; e.g., K 0TMH ∝ Θ(−δ ). These functions plotted in Fig. 8(a), taking T = T eff with an arbitrary overall rescaling to match the low-energy rate of the cavity QED dynamics. The cavity QED dynamics exhibits an enhancement in the energy-lowering spin-flip rates peaked at ∆ C . By contrast, the rate function for Metropolis-Hastings dynamics is constant for all energy lowering spin-flips. It is nearly constant at low temperatures for Glauber dynamics. In comparison, the cavity QED dynamics specifically favors those spin flips that dissipate more energy-the cavity allows these spins to flip at a higher rate. As we will see in Sec. VI, this "greedy" approach to steady state significantly changes the basins of attraction of the fixed points. The magnitude of T eff is a potential problem if we were to consider single atoms rather than ensembles of atoms. In such a case, J ij /T eff g 2 ∆ 2 C /(∆ 2 C + κ 2 ) 2 . Since typical parameters satisfy g ∆ C , κ [31], this ratio implies that the system lies within a high-temperature regime. However, this need not be the case because we can employ ensembles of identical spins as our nodal element in the network. As seen from Eq. (17), if all other ensembles are assumed to have s x i = ±M/2, then the relevant energy scale for a spin flip δ i is enhanced by the number of spins in the ensemble. This allows one to reach δ i /T eff ∝ M 1-i.e., a low-temperature regime by increasing M , the number of atoms per ensemble. The assumption that all ensembles are fully polarized is well founded: As noted earlier, the on-site interactions J ii drive the spins within an ensemble to align as if the system were composed of rigid (easy-axis) Ising spins. Moreover, the spin ensemble spin-flip rates are superradiantly enhanced, meaning that the time duration of any ensemble flip is short and ∝ 1/M . As described below, the superradiant enhancement also reduces the timescale required to reach equilibrium. As we discuss in Sec. VIII, to reach the quantum regime would require us to consider M 1, which in turn requires enhancements of the single-atom cooperativity so that the low-temperature regime may be reached at the level of single atoms per node.

B. Stochastic unravelling of the master equation
To directly simulate Eq. (16), we make use of a standard method for studying the time evolution of a master equation: stochastic unraveling into quantum trajectories [83,84]. In this method, the unitary Hamiltonian dynamics are punctuated by stochastic jumps due to the Lindblad operators. For our time-local master equation, the dynamics realizes a Markov chain. That is, the evolution of the state depends solely on the current spin configuration and the transition probabilities described by the spin-flip rates. Moreover, the transition probabilities are the same for all spins within a given ensemble.
A stochastic unraveling proceeds as follows. An initial state is provided as a vector s = (s x 1 , s x 2 , · · · , s x N ). The Lindblad terms in the master equation Eq. (16) describe the total rates at which spin flips occur. However, the total spin-flip rates additionally experience a superradiant enhancement from the matrix elements of S ± i . The collective spin-flip rates within an ensemble are thus where the upper (lower) sign indicates the up (down) flip rate. As such, the rate of transitions within a given ensemble increases as that ensemble begins to flip, reaching a maximum when s x i = 0 and then decreases as the ensemble completes its orientation switch. The ensemble spin-flip rates are computed for each ensemble at every time step. To determine which spin is flipped, waiting times are sampled from an exponential distribution using the total spin-flip up and down rate for each ensemble. The ensemble with the shortest sampled waiting time is chosen to undergo a single spin flip. The time in the simulation then advances by the waiting time for that spin flip. The process then repeats. This requires the total rates to be recomputed at every time step for the duration of the simulation. Results of such an approach are shown in Fig. 9(a).

C. Deterministic dynamics for large spin ensembles
A full microscopic description of the atomic spin states becomes unwieldy when considering large numbers of atoms per ensemble. Fortunately, the large number of atoms also means the full microscopic dynamics becomes unnecessary for describing the experimentally observable quantities. The relevant quantity describing an ensemble is not the multitude of microscopic spin states for each atom, but the net macroscopic spin state of the ensemble. We can therefore build upon the above treatment to produce a deterministic macroscopic description of the dynamics: Our approach is to construct a mean-field description that is individually applied to each ensemble. This description becomes exact in the thermodynamic limit S i → ∞ and faithfully captures the physics of ensembles with ≥ 10 3 atoms under realistic experimental parameters.
We begin by defining the macroscopic variables we use to describe the system. These are the normalized magnetizations of the spin ensembles m i ≡ S x i /S i ∈ [−1, 1] that depend on the constituent atoms via only their sum.
The m i remain of O(1) independent of S i , while fluctuations due to the random flipping of spins within the ensemble scale ∝ 1/ √ S i . The fluctuations about the meanfield value thus become negligible at large S i , and the m i follow deterministic equations of motion. Following the derivation in Appendix F (see also [30]), we find that the m i follow the coupled differential equations where i = j J ij S j m j is the local field experienced by the ith ensemble, given the configuration of the other ensembles. The equations are coupled because, as defined above, each δ ± i = −J ii ∓ 2 j J ij S j m j depends on all the other m j . Note that the self-interaction cost to flipping an atomic spin is largest for the first spin that flips within a given ensemble.
The term proportional to S i in Eq. (25) describes the superradiant enhancement in the spin-flip rate. This term would be zero if S i = 1/2, since m i = s x i /S i = ±1 in that case. For large ensembles this term acts to rapidly align the ensemble with the local field when |m i | < 1. The other terms act to provide the initial kick away from the magnetized |m i | = 1 states, but receive no superradiant enhancement in the spin-flip rate. A separation of time scales emerges between the rapid rate at which an ensemble flips itself to align with the local field, described by the superradiant term, and the slower rate at which an ensemble can initiate a flip driven by the other terms. The dynamics that emerge correspond to periods of nearly constant magnetizations |m i | = 1, punctuated by rapid ensemble-flipping events m i → −m i .
To gain analytical insight into the equations of motion, we make use of the separation of timescales that emerges in the large S i limit. When the ensembles are not undergoing a spin-flip event, they are in nearly magnetized states |m i | ≈ 1 and can be approximated as constants. The spin-flip energies δ ± i and rates K i (δ ) then become constant as well. This enables one to decouple the equations of motion Eqs. (25) to consider the flip process of a single ensemble. This in turn provides a simple analytical solution for the time evolution of the flipping magnetization, where the constants t i 0 are determined from the initial conditions to be These approximate solutions to the equations of motion accurately predict that ensembles will flip to align with their local field, and the ordering of the t i 0 predicts the order in which the ensembles will flip. Ensembles that are already aligned with the local field have a t i 0 < 0 and will not flip in the future unless the local field changes. On the other hand, ensembles not aligned with the local field have a t i 0 > 0 and will flip. The ensemble with the smallest t i 0 > 0 will flip first. This leads to a discrete version of steepest descent dynamics, since Eq. (27) implies that the smallest t i 0 corresponds to the largest energy change |δ i |. One may approximate all m i as constant up until the vicinity of the first t i 0 , at which point the ith spin flips and alters the local fields. The approximation then holds again for the new local fields until the next spin-flip event occurs. This process continues until convergence is achieved once all ensembles align with their local field. Overall, the mean-field dynamics takes a simple form: at every step, the system determines which ensemble would lower the energy the most by realigning itself, then realigns that ensemble in a collective spin-flip, continuing until all δ i > 0. The final configuration is a (meta)stable state corresponding to the minimum (or local minimum) of the energy landscape described by H Hopfield . Figure 9(a) shows a typical instance of the dynamics described by the equations of motion in Eqs. (25) and compares it to a stochastic unraveling of the master equation in Eq. (16). The 10 7 spins are divided into 100 ensembles, each representing an S = 10 5 /2 collective spin. We use a J ij matrix constructed from the CCQED connectivity with spin distribution width w = 1.5w 0 . The ensemble magnetizations are initialized close to a local minimum configuration. Specifically, five ensembles are chosen at random and misaligned with their local field while the rest are aligned with their local field to specify the initial condition. We see that the five initially misaligned ensembles realign themselves to their local fields. Their convergence to the local minimum state described by the J ij matrix occurs before the timescale set by spontaneous emission. We also see that the mean-field equations of motion closely match the stochastic unraveling.
We confirm that these dynamics are consistent with steepest descent (SD) by considering the evolution of energy, as shown in Fig. 9(b). The total energy of the spin ensembles monotonically decreases, with clearly visible steps corresponding to spin-flip events. These steps occur in order of largest decrease in energy. The effect of SD dynamics on the robustness of stored memories, and more generally on the nature of basins of attraction in associative memory networks, has remained unexplored to date. We address this question in the next section.

VI. IMPLICATIONS OF STEEPEST DECENT DYNAMICS FOR ASSOCIATIVE MEMORY
In the previous section, we found that for large-spin ensembles, the cavity-induced dynamics is of a discrete "steepest descent" form; i.e., at each time-step the next spin-flip event of an ensemble is that which lowers the energy the most. (This is in contrast to 0TMH dynamics where spins are flipped randomly provided that the spinflip lowers the energy.) We now explore how changing from 0TMH to SD dynamics affects associative memories. To understand the specific effects of the dynamics, in this section we consider both "standard" Hopfield neural networks with connectivity matrices J ij drawn from Hebbian and pseudoinverse learning rules and Sherrington-Kirkpatrick (SK) spin glasses. We find that SD not only improves the robustness and memory capacity of Hopfield networks, but also gives rise to extensive basins of attraction even in the spin glass regime. An introduction to Hopfield neural networks was given in Section III.
We will compare 0TMH dynamics, where any spin flip that lowers the energy of the current spin configuration is equally probable versus SD dynamics, where the spin flip that lowers the energy the most always occurs. Steepest descent is a deterministic form of dynamics, while 0TMH is probabilistic, and so fixed points can be expected to be more robust under SD. We also note that natural SD dynamics, such as that exhibited by the pumped cavity QED system, is more efficient than simulated SD dynamics (given an equal effective time step duration). This is because numerically checking all possible spin flips to determine which provides the greatest descent is an O(N 2 ) operation. Thus, interestingly, the natural steepest descent dynamics effectively yields an O(N 2 ) speed-up over simulations, by effectively computing a maximum operation over N local fields (each of which is computed via a sum over N spins) in O(1) time.
A. Enhancing the robustness of classical associative memories through steepest descent While the locations in configuration space of the local energy minima of Eq. (3), or equivalently the fixed point attractor states of Eq. (2), are entirely determined by the connectivity J ij , the basins of attraction that flow to these minima depend on the specific form of the energy minimizing dynamics. To quantify the size of these basins, we require a measure of distance between states. A natural distance measure is the Hamming distance, defined as follows. Consider two spin states s and s corresponding to N -dimensional binary vectors with elements s i = ±1. The Hamming distance between s and s is defined as d = N i=1 |s i − s i |/2. Thus, the Hamming distance between an attractor memory state and another initial state simply counts the number of spin-flip errors in the initial state relative to the attractor. This notion of Hamming distance enables us to determine a memory recall curve for any given attractor state under any particular energy minimizing dynamics. We first pick a random initial state at a given Hamming distance d from the attractor state, by randomly flipping d spins. We then check whether it flows back to the original attractor state under the given dynamical scheme. If so, a successful pattern completion, or memory recall event, has occurred. We compute the probability of recall by computing the fraction of times we recover the original attractor state over random choices of d spin flips from the initial state. This yields a memory recall probability curve as a function of d. We define the size of the basin of attraction under the dynamics to be the maximal d at which this recall curve remains above 0.95. Figure 10(a) shows memory recall curves, basin sizes, and their dependence on the form of the energy minimizing dynamics for a Hopfield network trained using the pseudoinverse learning rule; see Eq. (5). Three different forms of dynamics are shown: 0TMH, SD, and the CC-QED dynamics of Eqs. (25). As expected, the CCQED dynamics closely match the SD dynamics. The basin size depends on dynamics: SD and CCQED dynamics lead to a larger basin of attraction than 0TMH. Indeed, the entire memory recall curve is higher for SD than 0TMH.
This increase in the robustness of the memory, at least This is plotted for three forms of dynamics: 0TMH (red), SD (blue), and CCQED (orange). The latter is given by Eq. (25). This illustrates how the dynamics affects recall probability, and thus basin size (defined by the point where the recall curve drops below 95%). We use a connectivity matrix Jij corresponding to a pseudoinverse Hopfield model with N = 100 spins and P = 0.6N patterns stored. We choose this learning rule because it illustrates a large difference between the two dynamics. (b) Average basin size of attractors for connectivity of a Hebbian learning rule. The intensity of the trace increases with system size, N = 200, 400, 800, while color indicates the form of dynamics. The critical ratio αc ≈ 0.138 marks the known pattern loading threshold where Hebbian learning fails in the thermodynamic limit [16]. Inset: overlap between the attractor of the dynamics s µ and the bare memory ξ µ , as discussed in the text. (c) Average basin size for the pseudoinverse learning rule versus the ratio of patterns stored to number of nodes, P/N . Steepest descent dynamics outperforms 0TMH in all cases, yielding larger basin sizes for P/N > ∼ 0.1, and thus, greater memory capacity.
for small d, leads directly to an increase in basin size. To understand this, consider the following argument. Imagine the Hamming surface of 2 ( N d ) configurations a distance d from a fixed point at the center of the surface. The recall curve p(d) is the probability the dynamics returns to the fixed point at the center of the surface when starting from a random point on the surface. Under 0TMH, due to the stochasticity of the random choice of spin flip that the lowers energy, many individual configurations on the Hamming ball could flow to multiple different fixed points, thereby lowering the probability p(d) for returning to the specific fixed point at the center of the Hamming surface. However, under the deterministic SD dynamics, each point on the Hamming surface must flow to one and only one fixed point. Of course, many points on the Hamming surface could, in principle, flow under SD to a different fixed point other than the fixed point at the center. But, as verified in simulations (not shown), for small enough d, more points on the Hamming surface flow under SD to the central fixed point than under 0TMH. As such, recall probability increases for small d, as indeed shown in Fig. 10(a). This deterministic capture of many of the states on the Hamming surface by the central fixed point effectively enhances the robustness of the memory.
We can study how the SD and 0TMH dynamics affect the dependence of basin size on the number of patterns stored, and hence the memory capacity. We separately consider the two traditional learning rules, Hebbian and pseudoinverse. In both cases, we expect a trade-off between the number of patterns P stored and the basin size, with the latter shrinking as the former increases. Figures 10(b,c) demonstrate this trade-off for both learning rules. However, in both cases, switching from 0TMH to SD ameliorates this trade-off; at any level of memory load, the average basin size increases under SD versus 0TMH.
To calculate each point, we generate P random desired memory patterns ξ µ in order to construct the J ij matrix according to the learning rule under consideration. For each pattern, we determine its associated attractor state s µ . In the case of the pseudoinverse learning rule, the attractor state associated with a desired memory ξ µ is simply identical to the desired memory. However, in the Hebbian rule, the associated attractor state s µ is merely close to the desired memory ξ µ . We find this attractor state s µ by initializing the network at the desired memory ξ µ and flowing to the first fixed point under 0TMH. The inset in Fig. 10(b) plots the overlap N −1 i ξ µ i s µ i as a function of the pattern loading α = P/N of the Hebbian model. This overlap is close to 1 for α < α c ≈ 0.138 and drops beyond that, indicating that beyond capacity, the Hebbian rule cannot program fixed points close to the desired memories.
We focus specifically on the fixed points s µ to dissociate the issue of programming the locations of fixed points s µ close to the desired memories ξ µ from those of examining the basin size of existing fixed points and the dependence of this basin size on the dynamics. To determine the basin size for these fixed points, we compute the maximal input error for which the recall probability remains above 95%. This is performed for both learning rules under both 0TMH and SD dynamics. We then average the recovered basin size over the P patterns. Figure 10(b) presents the results for Hebbian learning, and SD dynamics appears to increase the basin size for pattern-loading ratios P/N > 0.1. Note that the basin size under 0TMH is not extensive in the system size N beyond the capacity limit P/N = α c ≈ 0.138N . However, remarkably, under SD the basin size of the fixed points s µ are extensive in N , despite the fact that the Hopfield model is in a spin glass phase at this point. Thus, the SD dynamics can dramatically enlarge basin sizes compared to 0TMH, even in a glassy phase. However, this enlargement of basin size does not by itself constitute a solution to the problem of limited associative memory capacity, because it does not address the programmability issue; above capacity, the fixed points s µ are not close to the desired memories ξ µ ; see Fig. 10(b) inset. Steepest descent can only enlarge basins, not change their locations. As such, for Hebbian learning, the memory capacity C ≈ 0.138N cannot be enhanced under any choice of energy minimizing dynamics unless an additional programming step is implemented.
In contrast to Hebbian learning, the pseudoinverse learning rule does not have a programmability problem by construction. The connectivity possesses a fixed point s µ identical to each desired memory ξ µ . Figure 10(c) shows that SD confers a significant increase in basin size for pseudoinverse learning as well. At large P/N , the basin sizes under SD are more than twice as large as under 0TMH dynamics.

B. Endowing conventional spin glasses with associative memory-like properties
Basin sizes do not typically scale extensively with N under 0TMH dynamics. This is because the number of metastable states grows exponentially (in the system size N ) in both the Hopfield model (with Hebbian connectivity for P/N 0.138) and in the SK spin-glass model [85]. However, we now present numerical evidence that SD dynamics endows these same energy minima with basin sizes that exhibit extensive scaling with N , even in a pure SK spin glass model. Thus, we find that the SK spin glass, under the SD dynamics, behaves like an associative memory with an exponentially large number of memory states with extensive basins. (A programming step would be required; see Sec. VII B.) More quantitatively, we numerically compute the size of basins present in an SK spin glass using both kinds of dynamics; see Fig. 11. Connectivity matrices J ij are initialized with each element drawn, i.i.d., as Gaussian random variables of mean zero and variance 1; normalization of the variance is arbitrary at zero tempera- ture. Rather than compute a uniform average over all metastable states of the J ij matrix-a task that is numerically intractable for large system sizes-we instead sample metastable states by initializing random initial states and letting those states evolve under 0TMH dynamics. Once a metastable state is reached, the basin of attraction is measured under both 0TMH and SD dynamics as discussed above. Figure 11(a) plots the average basin size as a function of the system size N , averaging both over realizations of J ij matrices and over the random initial states. Under 0TMH dynamics, the mean basin size decreases with increasing N , approaching zero at large N . This shows that for 0TMH dynamics, metastable states of the spin glass become very fragile, as expected from a traditional understanding of metastable states in a spin glass. In contrast, for SD dynamics, the average basin size scales extensively with system size in a roughly linear fashion. A phenomenological fit results in scaling of 0.0135(1)N + 0.91 (7).
These results help reveal the structure of the energy landscape. Vanishing basin size under 0TMH implies that a single spin-flip perturbation is sufficient to open new pathways of energy descent to different local minima. As soon as such pathways exist, 0TMH will find them, leading to vanishing basin size. For SD, we however see that the steepest path of energy descent is almost always the one leading back to the original local minimum that was perturbed, as long as the number of spin flips is less than about 0.0135N . This picture explains how metastable states are stable against small perturbations under SD, but unstable under 0TMH.
The extensive scaling of basin sizes found for SD dynamics not only means that metastable states with a finite basin of attraction exist, but in fact suggest the number of such states is exponential in system size, far greater than the capacity of any typical associative memory. While sampling metastable states via relaxation of random initial states under the standard 0TMH dynamics does not yield a uniform sampling, Fig. 11(b) shows that nearly all such discovered metastable states have a finite basin of attraction under SD dynamics. The distribution is peaked away from zero basin size, whereas with 0TMH the distribution is strongly peaked around zero. This suggests that extensive basin sizes under SD dynamics may be a property of almost all metastable states in the SK spin-glass.
We conclude that the SK spin glass enhanced with SD dynamics operates, remarkably, like an associative memory, with exponentially many spurious memories. These memories are random and non-programmable via any existing learning rule, but have extensive basins of attraction, tolerating up to 0.0135N input errors on average while maintaining a 95% probability of recall. While this basin size is small compared to that of stored patterns in standard associative memories-cf. Fig. 10-it is nevertheless extensive and there are exponentially more such memories than in standard Hopfield networks.

VII. ASSOCIATIVE MEMORY IN PUMPED CONFOCAL CAVITIES
We have thus far shown in Sec. IV that CCQED supports a large number of metastable states, and in Sec. V that the natural cavity QED dynamics produces a discrete steepest descent (SD) dynamics in energy. In Sec. VI, we showed that SD dynamics can enhance the robustness of memory when applied to traditional Hopfield models, and moreover, can endow the SK spin-glass with associative memory-like properties.
In this Section, we pull these key elements together to demonstrate how robust associative memory can be created in a transversely pumped confocal cavity. We begin by showing that metastable states in such systems indeed possess large basins of attraction. We then develop a method to solve the addressing, or programmability problem, which involves converting the patterns we wish to store into the patterns that naturally arise as local energy minima with large basins of attraction. Our pattern storage scheme is applicable to any connectivity matrix J ij . We verify its performance for the CCQED connectivity matrix both in the associative memory regime and the in SK spin glass regime.

A. Basins of attraction with confocal cavity QED connectivity
We now examine the basin size of the metastable states found in Sec. IV by tuning the width w of the distribution of ensemble positions. We measure the basin size of metastable states just as we did in Fig. 10 for Hopfield models and Fig. 11 for the SK model-by the relaxation of random perturbations. In particular, a random CC-QED connectivity matrix J is realized by sampling the spin ensemble positions from a 2D Gaussian distribution of fixed width w. A large number of random initial states are then prepared and allowed to relax via the native SD dynamics to a metastable state. The basin size of that metastable state is then found by initializing nearby random states and estimating the probability that these perturbed states evolve back to the metastable state as a function of the Hamming distance of the perturbation. Figure 12 shows how the basin size evolves with w for the CCQED connectivity. For each width w, the measured basin sizes are averaged over many realizations of the CCQED matrices J, with 200 random initial states per matrix. The number of realizations of J required for convergence varies with N , ranging from 1,600 realizations for N = 3 spins to 32 realizations for N = 800 spins. This is due to self-averaging at large system size. Figure 12(a) presents the average basin size as a function of w, using both the native SD dynamics and 0TMH for comparison. A sharp decrease in basin size occurs near the ferromagnetic-to-associative memory transition at w AM ≈ 0.67w 0 . For 0TMH dynamics, the mean basin size per spin quickly falls to zero after crossing w AM . Remarkably, however, steepest descent dynamics yields larger basins beyond w AM . These are extensive in size. Thus, just as in the case of the Hopfield model beyond capacity and in the SK spin-glass, switching from 0TMH to SD dramatically expands the size of basins from nonextensive to extensive. Moreover, there exist O(e N 0.4 ) metastable states in this regime, which is less than the O(e N ) expected of a spin glass; see Fig. 5. Overall, this demonstrates that a confocal cavity in this intermediate width regime functions as a high-capacity, robust associative memory with many metastable states with extensive basin sizes, albeit with random non-programmable memories. For larger widths, the average basin size asymptotically reduces to that exhibited in the SK model. This observation, when combined with the analysis of Sec. IV, provides further evidence that the confocal cavity connectivity yields an SK spin glass at large width.  12. (a) The average basin size in the CCQED system versus width of the spin ensemble distribution w. Metastable states are found by relaxing random initial seeds. Basin sizes for both 0TMH dynamics (red) and the (native) SD dynamics (blue) are shown. System sizes ranging from N = 3 to N = 800 are considered, with darker traces corresponding to larger sizes. We extrapolate to the limit N → ∞ by fitting a linear function of 1/N to the finite N data. The intercept at 1/N = 0 yields the thermodynamic limit (dashed lines). The ferromagnetic-to-associative memory phase transition, indicating the onset of multiple metastable states, is marked by a vertical black dashed line at wAM. The green line shows the typical basin size under SD dynamics for the SK connectivity, 0.0135(1)N , as extracted from Fig. 11. (b) Histogram of the basin sizes found for widths w = 1.5w0 and w = 4w0. Figure 12(b) shows examples of the distribution of basin sizes under SD dynamics for widths 1.5w 0 and 4w 0 . For w = 1.5w 0 , the variance in basin sizes is large despite a small mean size of ∼0.05N . Pattern completion is robust in this regime for a large percentage of basins since many exceed 0.1N in size. For w = 4w 0 , the basin size distribution is narrower, but still peaked away from zero. However, nearly all basins sizes are less than 0.1N , as expected in the spin glass regime; see Fig. 4(c).

B. Solving the programmability problem: a scheme for memory storage
We have demonstrated the presence of many metastable states with large basins of attraction (of extensive size) in the transversely pumped CCQED system. Such a system evidently manifests a robust recall phase with a large number of random memories. However, these memories are not directly programmable, in the sense that we cannot place the local energy minima wherever we wish in configuration space by tuning the connectivity. In contrast, for the Hopfield model with the Hebbian connectivity of Eq. (4), we can place energy minima close to up to 0.138N desired points ξ µ in configuration space, as long as these points are random and uncorrelated. Also, the more complex pseudoinverse rule of Eq. (5) enables us to place local energy minima at N arbitrary points (though there is no guarantee of any minimal basin size if some points are close to each other).
We now show how to effectively program the CCQED system to store any desired patterns using an encoder that translates the desired patterns to be stored into the non-programmable patterns that are naturally stabilized by the intrinsic cavity dynamics. While the pattern storage scheme we present here is designed with the physics of CCQED in mind, we note that the method is also applicable to any Hopfield or spin-glass like network connectivity with non-programmable energy minima with extensive basin sizes.
The basic idea behind the encoder is to find a linear transformation that maps any set of desired patterns, as represented by light-fields, into metastable spin configurations of the cavity. Once found, this transformation is then applied to any input state (i.e., light field) before allowing the confocal cavity network to dynamically evolve to a metastable state. The (optical) output can then undergo an inverse transform, back into the original basis. In this way, we harness the dynamics of the CC-QED spin network to store general patterns, and recall them through pattern completion, without ever needing to change the CCQED connectivity matrix.
To formally define the properties required of the transformation, M, we consider a set of patterns that we wish to store, {Ξ p } for p = 1 to P . We require that there are at least P metastable states of the Hopfield energy landscape {Φ p } that need to be found and cataloged. These may be found by, e.g., allowing random initial states to relax-note that there may be more than P such states. For an arbitrary input state x, the network operates as follows: 1. The input state x is transformed to an encoded statex = M(x). The transformation should smoothly map the desired pattern states onto metastable states of the Hopfield network: we thus require M(Ξ p ) =Φ p for all p.
2. The Hopfield network is initialized in the encoded statex and allowed to evolve to a metastable statẽ y. We now discuss how to find the smooth transformations M and M −1 obeying the properties above. The simplest option is a linear transformation, for which we should define M(x) = M · x, where M is the matrix that minimizes the error function Here, || · || is the Hilbert-Schmidt norm. The terms in the sum ensure that the patterns are mapped to the metastable states of the cavity QED system. However, this requirement under-constrains M for P < N . To remedy this, we add the term λ||M|| 2 to penalize large elements of M; λ is a regularization hyperparameter. This least-squares problem can be solved analytically, yielding: While this solves the linear transform problem, the linear encoding itself has a problem that we must address. Spin configurationsx should correspond to vectors whose elements are ±1, corresponding to the magnetization of the S x components of the spin ensembles. We require that the output of the transformation takes this form. Thus, we must modify the encoding so that the full transformation for an arbitrary input state is M(x) = sgn(M · x), where the sign function is understood to be applied element-wise. The encoder matrix defined in Eq. (29) will not be invertible, in general. However, by simply reversing the roles of the patterns and cavity minima in constructing the encoder matrix, we can construct a corresponding decoder matrix that maps the metastable states back to the desired patterns: The memory capacity of an associative memory is typically defined as the maximum number of patterns that can be stored as metastable states of the energy landscape. Under this standard definition, at most N total patterns can be mapped exactly onto attractor states of the CCQED connectivity because the transformation map is linear. This means that the maximum memory capacity of the encoded confocal cavity is N , which is the same as that of the pseudoinverse learning method, but significantly outperforms Hebbian learning with its capacity of ∼0.138N . However, a more practical definition of capacity would depend also on the size of the basins of attraction, and hence the robustness of the stored patterns. We now verify the memory capacity of N accounting for basin size.  13. Average basin size of patterns stored in a CCQED system as a function of P/N using the linear transformation encoding scheme (yellow). Results are shown for N = 800, however similar results are found for system sizes of N = 50-800. For comparison, the average basin sizes are plotted for the Hebbian (red) and pseudoinverse (blue) learning methods. In each case, we show the results of both SD (solid) and 0TMH (dashed) dynamics; w = 1.5w0.
We numerically benchmark the encoded pattern storage method with the CCQED connectivity and SD dynamics that are present in the cavity. Fixing the system size N , we generate P random patterns to store in the cavity. Because the CCQED connectivity hosts a large number of metastable states with large basins in the regime 0.7w 0 ≤ w ≤ 2w 0 , we set the width w to 1.5w 0 and construct a connectivity matrix. Metastable states are cataloged by relaxing P random initial seeds using SD. (Note that this procedure is biased toward selecting metastable states with the largest basins of attraction; these are preferable for our scheme.) This is then followed by construction of the encoder and decoder using Eqs. (29) and (30). The basin sizes of the pattern states are determined by applying increasing amounts of input error to the patterns until the probability of recall drops below 95%. Figure 13 compares the performance of encoding memories onto the metastable states of the CCQED connectivity against the results of Hebbian and pseudoinverse learning. The average basin size of our CCQED scheme is larger than Hebbian learning. The results do not depend strongly on system size. For the CCQED scheme, extensive basin sizes are indeed observed up to the maximum P = N stored patterns. This memory capacity represents roughly an order-of-magnitude improvement over the standard Hebbian learning model (0.138N ) and coincides with the pseudoinverse learning rule. The basin sizes in CCQED are not as large as pseudoinverse learning, though they are more physically realizable. Note that for the Hebbian learning, we measure recall as the probability of recovering the desired memory. As such, the drift of local minima from the desired patterns con-tributes to the collapse of basin size at P/N = 0.138.
Remarkably, we find that for memory loading ratios P/N ≤ 0.4, the average basin size with the encoding and decoding transformation significantly exceeds 0.1N . This is notable because earlier we found that the average basin sizes are less than 0.1N at this width, w = 1.5w 0 ; see Fig. 12. It seems that the encoding and decoding transformations significantly enhance the robustness of the associative memory. Inspection of the form of the linear transformation reveals that it focuses the P < N input states onto a subset of the metastable states and vice-versa for the decoder matrix. This effectively gives it a head start by reducing the Hamming distance between perturbed input states and the nearest metastable state.
The pattern storage scheme also works in the SK spin glass regime, either as applied to the CCQED connectivity at large w or to the actual SK model. In both cases, N memories may be stored. The linear transformation scheme is thus quite flexible: it is applicable to any model with extensively scaling basins of attraction. An interesting challenge for future work would be to see if it is possible to extend the current scheme to nonlinear mappings between pattern states and attractors in order to achieve a superextensive scaling of the memory capacity with system size N . This would provide a physically realizable fully programmable associative memory with capacity well beyond the P = N pattern limit. In this case, the fact the SK spin glass possesses an exponential number of metastable states with extensive basin sizes under SD dynamics could become particularly relevant for associative memory applications.

C. Weight chaos and the robustness of memory patterns
Considering the practical implementation of the associative memory, an important question is that of 'weightchaos' [86][87][88][89]. This is the problem that, for standard Hopfield learning rules, flipping the sign of a few (or even one) J ij matrix element can result in a model with a completely different set of metastable states. This arises due to the chaotic nature of glassy systems. In our case, weight chaos can arise from two sources. Imprecision in the placement of spin ensembles leads to fluctuations in the positions r i and thus in the nonlocal interaction J ij ∼ cos 2r i · r j /w 2 0 . Additionally, atom number fluctuations per spin ensemble can occur between experimental realizations, leading to fluctuations in J ij proportional to 1/ √ M . Neither of these can change the sign of large J ij elements, thereby avoiding the most severe, randomizing repercussions of weight chaos.
The effect of weight chaos is numerically explored in Fig. 14 under experimentally realistic parameters. We consider an uncertainty of 1 µm in the placement of spin ensembles and fix the total number of atoms in the cavity at N i=1 M = 10 6 independent of the number of ensembles considered. We then present two different mea- sures of weight chaos and its effect, as a function of the ensemble distribution width w. First, we quantify how metastable states drift between realizations. To do so, we take a particular J matrix and find a metastable state s by allowing a random initial state to relax via SD. We then realize a weight matrix J that is nominally the same as J but has added to it a particular realization of position and atom number noise. The state s is then allowed to again relax via SD to a potentially new state s to see if it has drifted. The overlap s · s /N quantifies this drift and is shown in Fig. 14(a) for many realizations. In the second measure, we directly quantify the change in the weights as a percentage of their original values via the quantity J − J / J , where · denotes the Hilbert-Schmidt norm of the matrix. Figure 14 demonstrates that metastable states are robust against drift for widths less than approximately w 0 , which includes the ferromagnetic and associative memory regimes of the confocal connectivity. Greater than 98% overlap between experimental realizations are expected. The overlap begins to drop for widths beyond w 0 as the connectivity transitions to a spin glass, because the nonlocal connectivity becomes increasingly sensitive to position at larger radii. The overlap decreases with increasing system sizes in this regime up to the maximum N = 400 ensembles probed in simulations. Atom number fluctuations set a floor on weight chaos that is independent of w. This is visible in Fig. 14(b) for small w. The noise floor rises with increasing N since M must decrease in order to fix the total atom number. A few percent deviation in J ij is acceptable for heuristic solvers, replacing one 'good-enough' solution with another. Position uncertainty will likely be smaller than 1 µm, so these results present a conservative picture.

D. Experimental implementation
We now present practical details of how the CCQED associative memory can be initialized in a particular (distorted) memory state. One may initialize the spin ensembles via longitudinal pumping to an arbitrary (encoded)x, wherex i = ±1 is the magnetization of the ith ensemble. As seen in Eq. (14), the spins can be subjected to a field h i created by longitudinally pumping the cavity. Physically, this is possible because in a confocal cavity, resonant light with any transverse profile (up to the resolving capability of the cavity mirrors) is supported due to the degeneracy of all TEM lm modes of good parity. If the longitudinal pumping is sufficiently strong, i.e., ϕ i > J ij , then the energy of spin flips δ i will be dominated by this field. By choosing sgn(ϕ i ) =x i , SD dynamics will evolve the spin ensembles into state x. The fact that only the pattern of signs matter for large fields presents a natural implementation of the threshold operation employed in the encoding M(x). The longitudinal pump is turned off once the state has been initialized, thereby allowing the system to evolve to the nearby metastable state determined by the connectivity J ij alone.
Many schemes would suffice for implementing the encoding and decoding transformations mentioned above. The transformation can be accomplished most simply electronically. In this scheme, a new longitudinal field is optically constructed based on the electronically transformed state. Similarly, after imaging the cavity light, the decoder matrix would be applied electronically to yield the final output state from the experiment. Alternatively, an all optical implementation can be achieved using spatial light modulators or other methods [90] if coherence in the field must be preserved.

VIII. DISCUSSION
We presented a practicable quantum-optical system that can serve as a neural network for associative memory. It does so by exploiting the natural spin coupling and superradiant dynamics provided by a CCQED system. The physical system lacks the direct correspondence between memories and the connectivity matrix that makes simple the learning of patterns in the idealized Hopfield model. Nevertheless, we find a straightforward pattern storage scheme for encoding memories in this physical system that allows N memories to be stored with a robust pattern completion process that tolerates an extensive number of initial errors. Remarkably, the emergent SD dynamics of the physical system extend the effective basins of attraction to enable robust recall. In terms of memory capacity, the system significantly outperforms Hebbian learning.
The multimode cavity QED platform is distinctive among annealing-based optimization schemes for its nonequilibrium dynamics. In "traditional" annealing approaches, the system either undergoes classical equilibrium dynamics in a potential landscape [91], or unitary quantum dynamics remaining close to the ground state [92]. In the CCQED system, the optimization occurs by the intrinsic dynamics of a driven-dissipative system. By contrast, in equilibrium contexts, any coupling to the outside world disrupts the system: dissipation must be overcome in these approaches, not embraced as in the present scheme.
Confocal cavity QED can be compared to other gain-based optimization schemes. These include weakcoupling, optics-based "coherent Ising machines." These have explored the notion that optimization problems can be heuristically solved using the dissipative coupling between multiple optical parametric oscillators (OPOs) [45,46,93,94]. The steady-state pattern of the oscillator phases that develops at threshold manifests the solution to this problem. This occurs because the coupling of OPOs is chosen so that the pattern of oscillation with lowest threshold corresponds to the solution of the optimization problem. As such, as one increases the pumping, this lowest threshold solution appears first. Similar ideas have also been explored in coupled polariton condensates [47] and lasers [95].
There are, however, crucial limitations to these weakly coupled optical gain-based optimizers: the output spin states correspond to coherent photon states, which only emerge as one crosses the threshold. This means that near threshold, the spins are "soft," allowing defects to form [96]. In addition, because photon number is not conserved, photon intensity in each node can change, leading to dynamic changes in the J ij weights defining the problem [97]. This requires the addition of classical feedback to control these populations [98,99] [100]. By contrast, the multimode cavity QED system has a nearly constant number of atoms per ensemble, so the connectivity matrix J ij remains stable. The spin variables are well defined far above the superradiant threshold: the superradiant enhancement of the dynamics forces ensembles to flip as one collective spin. We also point out that while the fully quantum limit of the CCQED system consists of a single strongly coupled atom at each position and so the state space becomes that of spin 1/2 particles, in contrast, the photonic or polaritonic coherent Ising machines possess (continuous) bosonic Hilbert spaces.
An important challenge for future work is to extend the present treatment into the fully quantum realm. This could be done by suppressing the dephasing we used to favor classical states, tuning pump strength near to the superradiant phase transition, and by considering few spins per ensemble. As noted earlier, currently we require a large number of atoms per ensemble to overcome heating; an alternative would be to design a configuration with enhanced single-atom cooperativity. While the cooperativity of a single 87 Rb atom coupled to TEM 00 mode is C = g 2 0 /2κγ = 2.5 [31], our system should already benefit from an enhanced cooperativity for an atom coupled to many degenerate modes with nonzero field amplitude at its location. The single-atom cooperativity is enhanced by the number of degenerate modes, due to the formation of a 'supermode' with larger electric field at the atom's position [32,33]. The confocal cooperativity is therefore far greater than unity; preliminary measurements suggest it is approximately 50 [33]. If achieved, this would allow quantum entanglement to play a role in the dynamics [101], enabling the exploration of the behavior of a fully quantum nonequilibrium neural network. Whether and how entanglement and quantum critical dynamics at the superradiant phase transition provide superior heuristic solution-finding capability remains unclear. Specific questions of interest include how entanglement might affect memory capacity or retrieval fidelity. We leave these questions for future exploration. Beyond quantum neuromorphic computing, such systems might be able to address quantum many-body problems [102].
Finally, we note that our pattern storage and recall scheme provides a novel shift in paradigm from the traditional Hopfield framework of memory storage. Indeed in this traditional framework, desired memories, as determined by network states driven by external stimuli (i.e., the patterns ξ µ in Sec. III) are assumed to be directly stabilized by a particular choice of connectivity matrix J. For example, the Hebbian rule (Eq. (4)) and the pseudoinverse rule (Eq. (5)) are two traditional prescriptions for translating desired memories ξ µ into connection patterns J that engineer energy minima at or near the desired memories. In contrast, the steepest descent dynamics in our system reveals that random choices of connectivity J, whether originating from the CCQED system, a glassy Hopfield model beyond capacity, or remarkably, even the canonical SK spin glass, yield Ising systems possessing exponentially many local energy minima with extensive basin size. Indeed, our storage scheme exploits this intrinsically occurring multiplicity of robust minima simply by mapping external stimuli ξ µ to them, rather than creating new minima close to the external stimuli. To date, this computational exploitation of the exponential multiplicity of local energy minima in glassy systems has not been possible because of their subextensive basin size under traditional 0TMH or Glauber dynamics. By contrast, the intrinsic physical dynamics we have discovered in the CCQED system directly enables us to computationally harness the exponential multistability of glassy phases, by using an encoder to translate external stimuli to natively occurring internally generated patterns. Further exploitation of these glassy states through nonlinear encoders remains an intriguing direction for future work.
Interestingly, in biological neural systems, multiple brain regions, like the hippocampus and the cerebellum, that are thought to implement associative memories, are synaptically downstream of other brain regions that are thought to recode sensory representations before memory storage; see, e.g., [103] for a review. Such brain regions could play the role of an encoder that learns to translate sensory stimuli into natively occurring memory patterns, in analogy to our storage framework. Moreover, there has been some neurobiological evidence that natively occurring neural activity patterns observed before an experience are themselves used to encode that new experience [104,105], though there is debate about the prevalence of this observation [106]. Of course, if neurobiological memory encoding does indeed select one of a large number of latent, preexisting stable activity patterns in order to encode a novel stimulus, it may be very difficult to experimentally observe such a stable activity pattern before presentation of the stimulus, due to limited availability of recording time. Thus, our work suggests it may be worth explorations beyond the Hopfield framework for memory storage, in which the internal representations of memories are determined not only by the nature of sensory stimuli themselves, but also by the very the nature of natively occurring potential stable memory states that exist, but are not necessarily expressed, before the onset of the stimulus.
Thus overall, by combining the biological principles of neural computation with the intrinsic physical dynamics of a CCQED system, our work leads to enhanced associative memories with higher capacity and robustness, opens up a novel design space for exploring the computational capabilities of quantum-optical neural networks, and empowers the exploration of a novel paradigm for memory storage and recall, with implications both for how we might computationally harness the energy landscapes of glassy systems, as well as explore novel theoretical frameworks for the neurobiological underpinnings of memory formation.
funding from the James S. McDonnell and Simons Foundations and an NSF Career Award.
To remove the kinetic energy dependence and the spatial oscillation of the Raman coupling strength, we propose the following trapping scheme for V trap (x): first introduce optical tweezers to tightly confine atoms in separate ensembles in different locations r i in the cavity transverse plane but at the midpoint of the cavity axis z = 0; then superimpose on the optical tweezers a deep 2D optical lattice with periodicity of 780 nm along both the pump and cavity direction to add a potential V lattice ∝ cos 2 (kx/2) + cos 2 (kz/2). Under the combined trapping potential, the spinor wavefunction can be expanded as as sum of individual ensembleŝ where Z(z) and ρ(r) describe the spatial confinement of an ensemble by the optical tweezers trap, ν i indexes individual spins inside an ensemble indexed with i, andψ ↑↓,νi is the spin operator for a single atom ν i . For each ensemble with M atoms, we define collective spin operators Taking the assumption that the width σ A of the atom profile Z(z) and ρ(r) satisfies the condition λ 780 nm σ A z R , we can perform the spatial integral along z and drop the oscillatory terms along x. The Raman coupling term becomes where the coupling strength between the ith ensemble and mth cavity mode g im is given by and we assume that different ensembles are separated in space with negligible overlap.
Putting this together, the CCQED system can be modelled by a master equation: Here, we account for cavity field loss at rate κ by Lindblad superoperators L[X] = XρX † − {X † X, ρ}/2, where a m is the annihilation operator for the mth cavity mode. The Hamiltonian of the CCQED system H CCQED is given by We will always choose transverse pump to be red detuned from the cavity modes so that ∆ m < 0. The mode-space representation of a longitudinal pump term is ϕ m . We will discuss the effects of additional classical and quantum noise terms in the Hamiltonian in appendices D and E. We now discuss how the finite spatial extent of the spin ensembles affects the connectivity. The effective nonlocal interaction between two ensembles is given by the convolution of the nonlocal interaction, proportional to cos 2r i · r j /w 2 0 , with the ensemble density profiles ρ i (x): (B1) The ensemble densities may be approximated by Gaussian profiles centered at positions r i of width σ A . We then find the effective nonlocal interaction The effect of this convolution is to rescale the term inside the cosine and dampen the amplitude of J ij for ensembles located far from the cavity center. The unconvolved nonlocal interaction is recovered in the limit of small σ A /w 0 .
In current experiments, σ A is at least an order of magnitude smaller than w 0 , thus we consider the unconvolved nonlocal interaction in this work.
variable with mean µ and variance σ 2 . Thus, the full dot product r i · r j is distributed as where χ 2 ν is a chi-square random variable with ν degrees of freedom. The difference of χ 2 ν random variables produces a variance-gamma distribution. To see this, note that the moment generating function for the difference of while the moment generating function for the variancegamma distribution with parameters µ, α, β, λ is (C5) Thus, the moment generating functions coincide for µ = β = 0, α = 1/2, and λ = ν/2 = 1. The random variable r i · r j is distributed like a variance-gamma random variable with the above parameters. Denoting the probability density of r i · r j as p ri·rj (x), we find where K ν (x) is a modified Bessel function of the second kind. The Bessel function simplifies for ν = 1/2 to the form K 1/2 (x) = π/2x e −x , yielding the simplified expression We now wish to find the probability density p cos (x) for the cosine of the random variable 2r i · r j /w 2 0 . Applying the general formula [107] for taking functions of random variables allows us to write where by cos −1 k,± (x) we denote the infinite number of possible inverses of cosine; more precisely, where arccos(x) ∈ [0, π] is the principal inverse of cosine. The derivative of cos −1 k,± (x) with respect to x appearing above is independent of k, Substituting in the derivative, we find Using the fact that | arccos(x)| ≤ π, the infinite sum can be rearranged as a geometric series, Evaluating the geometric series and rewriting in terms of hyperbolic functions finally results in (C13) This probability density is the main result of this Appendix.
We now present a number of quantities related to the distribution of couplings. First, the mean of the probability distribution can be computed in the usual way, as well as the variance: We may also find the correlation between couplings as a function of w. This is important, for instance, in comparing the connectivity to that of an SK spin glass in which the couplings are uncorrelated. We first use a trigonometric identity to write Since r i and r k are both Gaussian random variables with equal variances, their sum and difference are uncorrelated Gaussian random variables with twice the original variance. Thus, the two terms have the same expected value, and we find where · denotes an expectation over the random variables, and N 2 (µ, σ 2 ) denotes an isotropic twodimensional Gaussian random variable with mean µ and variance σ 2 . We can rewrite this in the symmetric form (C18) This expression now matches the expression for the expected value of the original p cos (x) distribution, but with w 2 rescaled by √ 2. Thus, we have already computed this quantity and can extract exhibiting full correlation at w = 0 and vanishing correlation at w → ∞. Finally, to determine the probability of generating negative couplings as a function of w, we integrate the probability density to yield the cumulative distribution function (C20) Evaluating the cumulative distribution at x = 0, the probability of finding a negative coupling term is which approaches 0 for small w and 1/2 for large w. Thus, in the limit of large w, we see that the two-point correlations decay and that negative couplings become equally likely. These are characteristics typical of a spin glass.
Appendix D: Spin-flip rates and decoherence A classical noise term inserted into the Hamiltonian induces decoherence. This allows us to restrict our analysis to the classical subspace with definite S x . We now derive the form of the spin-flip rate equations in the presence of classical noise. We start from the Hamiltonian H = H CCQED + H N with H CCQED defined in Eq. A13, and the noise term H N = χ(t) i S x i . We may note that the only term in H which takes one outside the classical subspace is the transverse field term ω z i S z i . For the classical limit to hold, the effects of this term should be small compared to the decoherence rate. As discussed in Sec. V, the noise term will arise naturally from the Raman pumping lasers, and can also be deliberately enhanced to ensure one remains in the classical limit.

Decay of coherences
To pinpoint the effect of the noise on coherences, we will rotate into the interaction picture with respect to H N . The time evolution in the interaction picture is where ρ 0 (t) = e −iH CCQED t ρ 0 e iH CCQED t . To analyze the time dependence of specific elements in the density matrix, we denote total S x states of the full system by vectors s = (s x 1 , s x 2 , · · · , s x N ), where s x i denotes the S x eigenvalue of the ith spin ensemble. The density matrix elements may then be indexed as ρ ss for two S x states s and s . As a notational tool, we denote the total difference in S x eigenvalues between s and s as ∆s = i (s x i − s x i ). If the noise correlations are faster than the spin dynamics due to the transverse field ω z , then the time evolution for a general matrix element ρ ss is found by averaging over realizations of the stochastic noise source: The average may be performed using the cumulant expansion of the noise source. We assume that χ(t) is a zero-mean, Gaussian signal such that only the second cumulant is nonzero. The noise-averaged term appearing in Eqn. (D2) is then given by We will further take the noise field to be stationary so that the correlation function is time translation invariant, χ(t ), χ(t ) = χ(0), χ(t − t ) . In this case, the noise correlations are defined by the spectral density, J c (ω), using the expression Inserting this into Eq. (D3), we find that This expression allows us to evaluate the decoherence induced by classical noise sources with different spectral densities J c (ω). We now consider the analytically tractable case of a noise source with an ohmic spectral density J c (ω) = 2αωe −ω/ωc for ω ≥ 0, where α is a dimensionless constant and ω c is an exponential cutoff frequency for the bath. Evaluating Eq. (D5) using this spectral density, we find that coherences decay with a functional form This is close to a power law in form. We can now extract the timescale over which the coherences decay. The above function is unity at t = 0 and decays to half of its initial value at time Thus, if one considers a coherent superposition of two states that have a difference ∆s between their eigenvalues of S x , the rate at which that coherence decays increases with ∆s. As such, coherence decays faster for ensembles with larger numbers of atoms. The classical limit is valid if ω z t 1/2 1. This is readily attainable in an experimentally relevant parameter regime, in which ∆ C is on the order of MHz, ω c is on the order of kHz, and α is on the order of 10.

Spin-flip rate in presence of an ohmic bath
We now examine the form of the spin-flip rate function that arises when coupled to an ohmic noise source. In Eq. (18), we gave the general expression for the spin-flip rate in terms of the noise correlation function. Using an ohmic spectral density, this becomes where h(δ ) is a symmetric peaked function. In the absence of noise, h(δ ) = δ(δ ), reflecting the fact that perfectly degenerate spin flips occur at a fast rate because they do not need to trade energy with cavity modes. Realistically, this delta function is broadened to a finite peak in the presence of even a small amount of noise. As the peak is symmetric, it induces heating since it encourages energy lowering and raising spin-flips at equal rates: we should tune the noise strength to keep the width of this function smaller than ∆ C . We now compute the form of this function in the presence of ohmic noise to leading order in αω 2 c /∆ 2 C . This is well-justified in the parameter regime stated above. The expression is is the modified Bessel function of the second kind, which has exponentially decaying tails. Figure 15 plots the spin-flip rate function both for the zero noise case and for noise strong enough to fully suppress superpositions in ensembles of up to M = 10 4 spins. In the latter case, we use α = 20 and ω c = 2π × 4 kHz. The symmetric peak broadens, but remains significantly smaller than the typical spin-flip energy scale ∆ C . In Sec. V B, we verified that such a peak is sufficiently narrow so as not to interfere with the SD dynamics that arise naturally in the cavity. Thus, ohmic noise can be employed to fully suppress coherence while preserving cavity dynamics.

Appendix E: Spin-flip rates with a quantum bath
This section provides an alternative derivation of spinflip rates, where we consider dephasing to arise from a quantum bath instead of the classical bath discussed in Appendix D; see also Ref. [30]. We start from a model H = H CCQED +H γ , where H γ is a quantum noise source: Here, B ik is a bosonic annihilation operator for the kth bath mode coupled to the ensemble at site i. We assume independent but identical baths for each site i, so the behavior of this source is fully determined by its spectral function J Q (ω) = k ξ 2 k δ(ω − ν k ). We will discuss below how the result of this model connects to that of classical noise by considering a high-temperature limit.
As before, we will consider the transverse field term perturbatively. It will be useful to define longitudinal and transverse parts of the Hamiltonian: We derive a spin-only master equation containing the spin-flip rates by treating the term H T perturbatively using the Bloch-Redfield approach [108]. Crucial to this procedure is the observation that H 0 = H L + H γ can be diagonalized by a unitary transformation that we will write in two parts as U = U F U γ , where U F and U γ are chosen to diagonalize H L and H γ , respectively. The former is written where we defined F m = ϕ m + i g im S x i . This transformation commutes with S x i , and thus with H γ . As discussed in Sec. V, this transform leads to the replacement a m → a m + F m /(∆ m + iκ), which also modifies the Lindblad dissipation term describing cavity loss. Generically, with the second term shifting the Hamiltonian. Incorporating this contribution of the Lindblad term into the transformedH L , we find This expression is now diagonal, with eigenstates defined by classical states of S x i and photon number states. We note that the second term is the Hopfield model in a longitudinal field: with connectivity and longitudinal fields: (E7) The term H γ may be similarly diagonalized by the transformation This commutes with H L and transforms H γ tõ Now thatH 0 =H L +H γ is diagonal in the transformed frame, we may implement the standard Bloch-Redfield procedure in which the (transformed) termH T is treated perturbatively. We first determine the form ofH T = U † H T U . Working in the interaction picture ofH 0 , we find We now examine the factors that appear in this expression. First, note the raising and lowering operators take the unusual form S ± = S y ± iS z because the S x spin states are the eigenstates of H 0 . The interaction picture time dependence of these operators is This operator takes a simple form in the S x subspace because H Hopfield is diagonal in this subspace. Specifically, consider S x states denoted |s = |s x 1 , s x 2 , · · · , s x N , where s x i denotes the S x i eigenvalue for the ith spin ensemble. The time dependence of the matrix elements is then where E s is the energy eigenvalue of the state |s of H Hopfield . The difference in energy appearing in the exponential, which results from flipping a single spin in the ith ensemble, can be written The time dependence of the raising and lowering operators therefore takes the simple form S ± i (t) = e iδ ± i t S ± i . The other factors in Eq. (E10) are polaronic operators describing how the spins are dressed by the photons Their time-dependent forms involve the interactionpicture time dependence of a m , B ik . With the full forms ofH 0 andH T (t) now specified, the standard Bloch-Redfield procedure (in the Markovian approximation) begins by writing: We include both the cavity modes and noise operators B ik in tracing over the "bath" degrees of freedom above, yielding a master equation for the spin degrees of freedom only. Note that tracing out the cavity modes at this stage preserves the vital physics describing the spin-flip dynamics, as the diagonalized Hamiltonian H 0 includes the full cavity-mediated interaction between spins. From this point on, we assume the confocal limit, ∆ m = ∆ C .
The Bloch-Redfield procedure eventually yields an expression that may be written in the interaction picture as where the spin-flip rates are K i (δ ) = Re[Γ i (δ )] and The Lamb shift H Lamb term is where L i (δ ) = Im[Γ i (δ )]. Since S + i S − i is diagonal in the S x basis, H Lamb generates no spin-flip dynamics. For the quantum bath, C(τ ) takes the form C Q (τ ) = dω J Q (ω) ω 2 [coth (βω/2) (1 − cos(ωτ )) + i sin(ωτ )] , where J Q (ω) is the spectral density of the quantum bath. We can recover the results of Appendix D for a classical bath by taking the limit β → 0 while rescaling the spectral density to keep the integrand finite. This corresponds to suppressing quantum noise while maintaining a finite level of thermal (classical) noise. Formally, we keep J C (ω) = J Q (ω) coth(βω/2) J Q (ω)(2k B T /ω) finite while J Q (ω) → 0. This effectively removes the imaginary term from C(τ ), giving We conclude this section by presenting the case in which J Q (ω) describes an ohmic quantum bath, We use a slightly different normalization than in the classical case for algebraic convenience. Note also, following the previous paragraph, that the high-temperature classical limit of this bath would be classical white noise, rather than classical ohmic noise. The function C Q (τ ) can be calculated exactly in the zero-temperature bath limit: The integrals in the spin-flip rate can then be evaluated to give the exact expression where we have introduced the function φ(s, z) = z s−1 e z Γ(1 − s, z). Here, Γ(s, z) is the upper incomplete gamma function and θ(x) denotes the Heaviside step function. The function φ(s, z) asymptotically approaches 1/z for large |z|. Thus, we can explicitly calculate the rate to be in the large detuning limit defined by |∆ C | δ , ω c . The bath-dependent term h Q (δ ) is given by The function h Q (δ ) dominates the rate function for δ < 0, but vanishes for δ > 0. The cavity parameters ∆ C and κ drop out of the bath term, showing that the dynamics are predominantly determined by the bath structure. Figure 16 plots K i (δ ) for several values of the bath coupling strength α Q . The expression has the property that for α Q > 1 and negative δ , h Q (δ ) approaches zero at |δ | = 0, attains a maximum at |δ | = (α Q − 1)ω c , and then falls off exponentially for larger |δ |. This means that for energy lowering processes with |δ | < (α Q −1)ω c , spin flips that lower the energy by a greater amount have a higher spin-flip rate; this is like SD. For α Q ≈ 1, the spin-flip rate function is nearly flat while δ < 0 and yields 0TMH-like dynamics. Last, h Q (δ ) diverges at δ = 0 for α Q < 1. This results in the spins preferring to perform trivial spin flips that lower the energy by only small amounts rather than making substantial energy lowering flips. Such dynamics would allow the spins to fluctuate as much as possible before settling to a local minimum in the long-time limit and is undesirable for associative memory.  16. The spin-flip rate versus energy for an ohmic quantum bath for various bath coupling strengths αQ. Three regimes of dynamics are apparent. The spin-flip rate diverges for αQ < 1 near δ = 0, giving rise to dynamics that perform the smallest energy-lowering spin flips possible. For αQ 1, the spin-flip rate is roughly flat for negative δ , matching 0TMH dynamics. Finally, for αQ > 1, the spin-flip rate rises sharply as δ approaches zero. This describes dynamics akin to SD, in which the most energy-lowering spin flip has the highest spin-flip rate.
Appendix F: Derivation of mean-field dynamics The deterministic ensemble dynamics of Sec. V C are now derived from the stochastic spin dynamics of Sec. V B. This is a specific example of deriving mean-field equations from a continuous time Markov chain [109]. As such, we first formalize the structure of the Markov chain. Each individual spin can either be in the up or down state, and so we define the occupancy vector x i,j to denote the current state of the ith spin in ensemble j, where x i,j = (1, 0) for the up state and x i,j = (0, 1) for the down state. These define the microscopic degrees-offreedom in the Markov chain. We average over the microscopic degrees-of-freedom to produce a macroscopic description, x j = M i=1 x i,j /M . The ensemble occupancy vectors can be written as is the fraction of spins in the jth ensemble that are up (down).
To obtain the mean-field equations of motion, we construct the generator matrix describing the transition rates between different occupancy states. We must specify the spin-flip rate experienced per spin rather than the total ensemble rate. We thus normalize by the number of spins that are available to flip in the given direction, M x ± i = 2Sx ± i , in terms of the ensemble populations. After normalization, the generator matrix is where S = M/2 and the variable s x i = S(x + i − x − i ) is the average spin state of ensemble i. The entries of the generator matrix are defined such that the off-diagonal elements give the transition rates for the up → down and down → up transitions, while the two diagonal elements give the total rate at which the spins exit the up or down state.
The mean-field limit, exact for S → ∞, describes the time evolution of the ensemble occupancy vectors in terms of a differential equation involving the generator matrix. Explicitly, under the only constraint that the rate functions be Lipschitz continuous, the limit differential equation for the ensemble occupancy vectors is We may rewrite this equation as an ordinary differential equation in terms of the ensemble magnetizations m i = x + i − x − i = s x i /S. This is Noting that the sign of K i (δ + i ) − K i (δ − i ) depends on whether the ensemble is aligned with its local field i = j J ij s x j , we arrive at the expression for the large ensemble equation of motion Eq. (25) presented in the main text. Referring back to Fig. 9, the mean-field equations match the full stochastic unraveling quite well for the chosen parameters.