Atomic spin-wave control and spin-dependent kicks with shaped sub-nanosecond pulses

The absorption of traveling photons resonant with electric dipole transitions of an atomic gas naturally leads to electric dipole spin wave excitations. For a number of applications, it would be highly desirable to shape and coherently control the spatial waveform of the spin waves before spontaneous emission can occur. This work details a recently developed optical control technique to achieve this goal, where counter-propagating, shaped sub-nanosecond pulses impart sub-wavelength geometric phases to the spin waves by cyclically driving an auxiliary transition. In particular, we apply this technique to reversibly shift the wave vector of a spin wave on the $D2$ line of laser-cooled $^{87}$Rb atoms, by driving an auxiliary $D1$ transition with shape-optimized pulses, so as to shut off and recall superradiance on demand. We investigate a spin-dependent momentum transfer during the spin-wave control process, which leads to a transient optical force as large as $\sim 1\hbar k$/ns, and study the limitations to the achieved $70\sim 75\%$ spin wave control efficiency by jointly characterizing the spin-wave control and matterwave acceleration. Aided by numerical modeling, we project potential future improvements of the control fidelity to $99\%$ level when the atomic states are better prepared and by equipping a faster and more powerful pulse shaper. Our technique also enables a background-free measurement of the superradiant emission to unveil the precise scaling of the emission intensity and decay rate with optical depth for the first time to our knowledge.


I. INTRODUCTION
Spontaneous emission is typically a decoherence effect to avoid when levels in small quantum systems are chosen to encode information for e.g., quantum computation, simulation, or sensing [1][2][3][4]. As "spontaneous" as it is, the information flow during the process can nevertheless be controlled between long-lived matter degrees of freedom and a pre-aligned single-mode electromagnetic continuum [5,6]. In particular, since the seminal work by Dicke in 1954 on super-and subradiant effects of light emission by ensembles of excited atoms [7], it is now well-known that the spatio-temporal properties of spontaneous emission are in principle dictated by collective properties of the atoms themselves. For spatially extended atomic ensembles, the timed phase correlations of the collective excitations, in the form of spin waves with phase-matched wave vector k satisfying |k| = ω/c, can direct superradiant emission into narrow solid angles [8][9][10][11]. This collective enhancement forms the basis for many applications predicated upon efficient quantum atom-light interfaces [12,13]. However, there has been relatively little work to address the question of what happens when |k| = ω/c becomes strongly phase-mismatched from radiation. Within the context of atomic ensembles, complex phenomena can arise involving the combination of spatial disorder, multiple scattering of light, and dipole-dipole interactions between atoms [14][15][16][17][18], with much still left to be understood. Furthermore, within the emerging field of quantum optics with atomic arrays, such phase-mismatched states are predicted to be strongly subradiant. This forms the basis for exciting applications like waveguiding of light by the array [19][20][21], atomic mirrors [22][23][24], and exotic states [25,26] including emergent Weyl excitations [27] and topological guided edge modes [28,29], and the generation of highly correlated, "fermionized" states [30]. One major bottleneck to exploring and controlling all of these phenomena is the fact that any optical pulses used to manipulate atomic excited states, being radiation waves, most naturally excite phase-matched spin waves, while spin waves with |k| = ω/c are naturally decoupled from light. What is needed then is a technique to efficiently and coherently alter the phase-matching condition of collective atomic excitations in the temporal domain -that is, to modify the wave vector of the spin waves and coherently convert between radiant and subradiant modes -on rapid time scales faster than the typical emission time of atoms themselves.
Previously, in a short Letter [31], we described a novel experimental realization of coherent dipole spin wave control, based upon rapidly and cyclically driving an auxiliary transition with counter-propagating control pulses, in order to robustly imprint spin-and spatiallydependent geometric phases onto the atoms. The resulting k-space shifts lead to spin waves in a 87 Rb gas with strongly mismatched wave vectors |k| = ω/c "off the light cone" and suppressed superradiant emission. Later the mismatched spin waves are shifted back "onto the light cone" to cooperatively emit again on demand. Here, in this accompanying paper, we expand upon key details that were omitted, describing in greater detail the experimental implementation and theoretical modeling of the geometric control technique, and additional research advances enabled by the technique.
In particular, we characterize the fidelity of our control sequence with careful measurements of not only the controlled superradiance, but also an accompanying matterwave acceleration effect. With the absolute measurements of optical acceleration, we are able to precisely model the light-atom interaction during the subnanosecond control, so as to unveil the limitations to the control fidelity for future improvements.
In addition, we exploit the ability of shifting the dipole spin wave vector to characterize the phase-matched forward collective emission in a background-free manner. The forward collective emission has been studied previously as a strong signature of cooperative enhancement of light-atom interactions in cold atomic gases [10,11,16]. One challenge to measure the forward cooperative emission in these experiments is related to the fact that the exciting beam, typically with a much stronger intensity, is in the same direction as the forward emission and contributes a large background. Here, the ability to shift the spin wave vector in our case allows us to detect the forward superradiant emission from a different, backgroundfree direction. We experimentally quantify the scaling of the emission intensity i N ∝ N 2 and collective decay rate Γ N /Γ = (1 + OD/4) [14,17,18,26], for superradiant emission involving N atoms with an average optical depth OD.
This work also brings together two seemingly unrelated phenomena: The control of collective dipole radiation, and the acceleration of the free emitters. Accompanying the nanosecond spin-wave k-vector shifts we observe a strong spin-dependent optical force that accelerates the atomic sample at a ∼ 10 7 m/s 2 transient rate. Similar techniques of cyclic rapid adiabatic passage have been studied in pioneering work by Metcalf and coworkers [32,33] as a robust way to impart strong optical forces to neutral atoms and molecules, with important applications in laser cooling [34,35]. The ∼ 1 k/m per nanosecond optical acceleration in this work is among the highest [34,36]. The negative impacts by the spindependent acceleration to the spin-wave control is negligible in this work, and can be mitigated with lattice confinements in future experiments [31]. On the other hand, the combined effects may open interesting opportunities at the interface of quantum optics and atom interferometry [37].
The remainder of the paper is structured as follows.
First, in Sec. II, we provide a simple, idealized theoretical description of our protocol to manipulate spin waves by rapid geometric phase patterning using shaped subnanosecond pulses. In Sec. III, we detail the experimental implementation of the coherent control and discuss the background-free detection of the cooperative emission as well as matterwave acceleration effects. Here we also generalize the simpler theoretical discussion for quantifying the efficiency of our protocol in the face of various imperfections. We summarize this work in Sec. IV, and discuss methods and technology for improvements of the spin-wave control efficiency to the ∼99% level. To ensure completeness and to provide better context, key ideas from Ref. [31] are repeated in this work.

A. Preparation and control of optical spin wave
Single collective excitations of an atomic ensemble, consisting of N atoms with ground state |g and excited state |e , are naturally described by "spinwave" excitations or timed-Dicke states of the form |ψ TD (k p ) = S + (k p )|g 1 , g 2 , · · · , g N . Here S + (k p ) = 1 √ N i e ikp·ri |e i g i | denotes a collective spin raising operator and k p the corresponding wave vector. For example, a weak coherent state involving such excitations are naturally generated by an incoming plane-wave "probe pulse" with Rabi frequency Ω p and duration τ p 1/Ω p , if τ p is in addition short enough that light re-scattering effects are negligible. As such, the magnitude of the wave vector |k p | = ω eg /c, with ω eg being the atomic resonance frequency, matches that of free-space radiation. As is well-known [9][10][11], such "phase-matched" spin wave excitations in a large sample with size σ 1/k p and N > k 2 p σ 2 radiate efficiently and in a collectively enhanced fashion, much like a phased array antenna, into a small solid angle θ 2 with θ ∼ 1/k p σ around the forward k p direction (also see Sec II C.).
We are interested in spin waves with strongly mismatched |k| = ω eg /c "off the light cone", in which case the cooperative emission is prohibited. In an ensemble with random atomic positions {r i }, as in this work, such a spin wave is expected to decay with a rate near the natural linewidth [38], as the fields emitted by different atoms in any direction average to zero, but with nonzero variance. On the other hand, in an ordered array of atoms, the destructive interference in all directions can be nearly perfect, leading to strong subradiance that forms the basis of the exciting applications mentioned earlier [19][20][21][22][23][24][25][26][27][28][29]. It is therefore highly compelling to have a technique to shift the spin waves in k-space on a time scale much faster than the spontaneous emission lifetime. To achieve the required k-vector shift, we consider a unitary transform U c (∆k) such that S + (k + ∆k) = U c (∆k)S + (k)U † c (∆k). The required control is a class of state-dependent phase-patterning operations, which can Generation R e -direction Switch-off Recall Generation and control of optical spin wave with the probe beam followed by the control pulse-induced re-direction, switch-off and recall of the collective spontaneous emission. The |g − |e electric dipole spin wave is illustrated with fringes in the atomic sample. Each optical control also imparts a spin-dependent kick, which leads to momentum transfer with vr = kc/m ≈ 5.8 mm/sec, m being the atomic mass of 87 Rb. The drawings are not to actual scales, in particular, the phase-matching angle θ = arccos λ D2 λ D1 ∼ 11.1 • is exaggerated for clarity. Bottom: Bloch-sphere representation of the projected |g − |a state dynamics for an atom at a representative position r. An ensemble of trajectories with different control pulse peak intensity parameters s are displayed. The quasi-adiabatic control ensures that the geometric phase writing is insensitive to small deviations of s from s ∼ 0.6 × 10 6 , for τcΓD1 = 0.02 ∼ 0.03 in this work.
be decomposed into spatially-dependent phase gates for each two-level atom as (with σ z,i = |e i e i | − |g i g i |) While we have thus far focused on the manipulation of spin wave excitations, by considering the position r i as an operator as well, one sees that the transformation of Eq. (1) also imparts opposite momentum boosts to the "g", "e" components of a freely moving atom. Techniques for realizing such "spin-dependent kicks" have been welldeveloped in the community of atom interferometry and ion-based quantum information processing typically on Raman transitions [39][40][41]. Here, we demonstrate for the first time a high-efficiency process based upon rapid manipulation of a strong optical transition for the spinwave control.

B. Error-resilient spin wave control
To implement U c (∆k) in Eq. (1) on a strong dipole transition in a large atomic sample, we combine the geometric phase method suggested in Ref. [38] with optical rapid adiabatic passage techniques as in Refs. [32,33] for the necessary control speed, precision, and intensityerror resilience. In particular, we consider two nearly identical optical "control" pulses with Rabi frequency Ω c (t)e iϕ1,2 and instantaneous detuning δ c (t) to drive the auxiliary |g → |a and |a → |g transitions respectively. By optical pulse shaping, the cyclic transition can be driven with high precision in a manner that is largely insensitive to laser intensity. We adapt a simple choice of Ω c (t) = Ω 0 sin(πt/τ c ) and δ c (t) = −δ 0 cos(πt/τ c ) to achieve quasi-adiabatic population inversions at optimal {Ω 0 , δ 0 } within the τ c pulse duration [32,33]. To connect to the experimental setup (Figs. 1 and 2), after preparing the S + (k p ) spin wave excitation, we send the first control pulse to the atomic sample, propagating along the direction given by wave vector ±k c , driving |g → |a . Subsequently, a second control pulse with ∓k c interacts with the sample after a τ d delay, driving |a → |g . Although every ground-state atom in the original |g −|e spin wave returns to |g , the area enclosed around the Bloch sphere by the state vector causes each atom to pick up a spatially dependent geometric phase ϕ G (r) = π + ∆k · r i , with ∆k = ±2k c to fully exploit the resolution of the optical phase. The ideal state-dependent phase patterning achievable in the short τ c,d limit can be formally expressed within the {|g , |e } space as which performs a k → k − ∆k shift to spin-wave excitations the same way as U c (−∆k) in Eq. (1). Although the phase patterning operation in Eq. (2) could in principle be achieved without the control pulse shaping, practically the control laser intensity inhomogeneity across the large sample would translate directly into spin control error, as in most nonlinear spectroscopy experiments [42][43][44], leading to degraded average control fidelity if a highly uniform laser intensity profile cannot be maintained. To fully exploit the intensity-error resilience [45] offered by the SU(2) geometry of the twolevel control, the optical pulses need to be shaped on a time scale fast enough to suppress the spontaneous emission, and also slow enough to avoid uncontrolled multiphoton couplings in real atoms. We note that optical methods for two-level rapid adiabatic passage [46] itself are well-developed for population transfers in atoms and molecules using ultrafast lasers [47,48]. However, these ultrafast techniques typically demand control field Rabi frequencies Ω c beyond the THz level with intense pulses at low repetition rates, not easily adaptable to our desired goals. The strong fields may also cause non-negligible multi-level couplings or even photo-ionization beyond the desired multiple population inversions. Instead of using ultrafast lasers, in this work we develop a wideband pulse shaping technique based on fiber-based sideband electro-optical modulation of a cw laser [31], with up to 13 GHz modulation bandwidth, to support the flexibly programmable error-resilient spin wave control. Compared with previous work on spectroscopy based upon perturbative nonlinear optical effects [42][43][44], our technique is unique in that we steer the atomic state over the entire Bloch sphere of the two-level system to achieve the geometric robustness toward perfect spin-wave control set by Eq. (2).

C. Simple dynamics of controlled emission
Here, we go beyond Ref. [31], to discuss the expected emission characteristics of phase-matched and mismatched spin waves, which will constitute one of the key observables to verify our coherent control technique and quantify its efficiency. Later, in Sec. III C, we will experimentally verify the predicted optical depth dependence for the phase-matched case. To specifically relate to experimental control of a laser-cooled gas in this work, we consider an atomic sample with a smooth profile (r) = i δ(r − r i ) nearly spherical with size σ 1/|k p | and at a moderate density with < |k p | 3 . Formally, the quantum field emitted by a collection of atoms on the |e − |g transition can be expressed in terms of the atomic spin coherences themselves, 25]. Here G(r, ω eg ) is the free-space Green's tensor of the electric field, which physically describes the field at position r produced by an oscillating dipole at the origin, and d eg = d eg e d is the transition dipole moment. For the singleexcitation timed Dicke state, one can define a singlephoton wave function ε k (r) = g 1 , g 2 , ..., g N |Ê s |ψ TD (k) , which describes the spatial profile of the emitted photon. Of particular interest will be the field emitted along the direction k at the end of the nearly spherical sample, and at a transverse position r ⊥ . Within the so-called "Raman-Nath" regime where diffraction is negligible, we obtain ε k (r) averaged over random {r i } as (also see Appendix B) Here r k = r · k/|k|, and c (r ⊥ , δk) = 1 N (r)e iδkr k dr k is a generalized (and normalized) column density, where in general we allow for a wavenumber δk = |k| − ω eg /c that is mismatched from radiation. The σ r = ω eg α i /c is the resonant absorption cross-section, and the imaginary part of the resonant polarizability α i is related to the dipole element d eg and Γ through Γα i = 2|d eg | 2 (While d eg and Γ are directly related for two-level atoms, this formula also generalizes to atoms with level-degeneracy.).
Equation (3) describes the possibility of both enhanced or suppressed collective emission associated with the spin-wave excitation at a δk radiation wavenumber mismatch. A well-known consequence of Eq. (3) is that when the spin-wave excitation is by a weak probe pulse with wave vector k p (see Fig. 1), the atoms act as a phased antenna array with δk = 0, and light in the forward direction along k p is re-emitted at an enhanced rate within an angle θ ∼ 1/(|k p |σ) [49]. As our probe pulse is in a weak coherent state, the timed Dicke state is excited with a population of N θ 2 p , where θ p = 1 2 Ω p dt is the timeintegrated Rabi frequency of the probe pulse (θ p 1). Assuming that the spatial profile of Eq. (3) does not change significantly during the emission process, one can integrate the intensity of light predicted by Eq. (3) over space, and arrive at the following time-dependent collective spontaneous emission rate: Here OD p ≡ OD 2 p (r ⊥ )/ OD p (r ⊥ ) ∝ N is the average optical depth, and OD p (r ⊥ ) = N c (r ⊥ , δk = 0)σ r . The exponential factors of e −Γt and e −ODpΓt/4 account for the (non-collective) decay into 4π and enhanced emission along the phase-matched k p direction, respectively.
We now consider what happens, if immediately following the probe pulse at t = 0, we apply the ideal spin wave control, which imprints a geometric phase and transforms the original timed-Dicke state along k p according to Eq. (2) (finite delay times and other imperfections can be straightforwardly included, as discussed in Sec. III). As the original state has a well-defined wave vector, the application of Eq. (2) simply creates a new timed Dicke state with new wave vector k s = k p − 2k c . Then, two distinct cases emerge. The first is that |k s | ≈ ω eg /c, in which case the spin wave is again phase-matched to radiation, and an enhanced emission rate like Eq. (4) is again observed, but with the majority of emission "redirected" along the new direction k s . The second, and more intriguing, possibility is that |k s | is significantly mismatched from ω eg /c. In that case, there is no direction along which emission can be constructively and collectively enhanced. For the case of our disordered ensemble, this results in the emission rate into the same solid angle in absence of phase-matching as: i.e., the emission reduces to an incoherent sum of those from single, isolated atoms. This is due to the random positions of the atoms, such that the field of emitted light in any direction tends to average to zero, but with a nonzero variance (corrections to Eq. (5) are expected at high densities due to granularity of the atomic distribution, which make the problem quite complex in general [14,15,50].). However, in an ordered array of atoms, the destructive interference in all directions can be nearly perfect, leading to a decay rate much smaller than Γ. The ability to generate excited states with extremely long lifetimes is key to all of the applications mentioned in the introduction [19-25, 27-29, 51].

A. Experimental Methods
In this work the dipole spin wave excitation is implemented on the 87 Rb 5S 1/2 − 5P 3/2 D2 line between hyperfine ground state 5S 1/2 F = 2 and excited state 5P 3/2 F = 3, represented by |g and |e in Fig. 2 respectively. The transition wavelength is λ D2 = 780 nm (with k p = 2π/λ D2 ), and the natural linewidth is Γ D2 = 2π × 6.07 MHz. We prepare N ∼ 10 4 87 Rb atoms in |g in an optical dipole trap with up to ∼ 5 × 10 12 /cm 3 peak density and T ∼ 20 µK temperature. After the atoms are released from the trap, the dipole excitation is induced by a τ p = 3 ∼ 5 ns, I p ≈ 10 mW/cm 2 resonant D2 probe pulse. The Gaussian probe beam has a w p ≈ 50 µm waist, which is much larger than the 1/e radius of atomic density profile σ ≈ 7 µm, validating the plane-wave excitation picture.
We choose the polarization for the probe and control lasers to be along e y and e x respectively. Taking the e x −direction as the quantization axis, the π-control couplings to 5P 1/2 would be with equal strengths and detunings for all the five 5S 1/2 F = 2, m F Zeeman sublevels, and with vanishing hyperfine Raman coupling, if the 5P 1/2 hyperfine splitting ∆ D1,hfse = 2π × 814.5 MHz can be ignored. The approximation helps us to establish the simple two-level control picture in Fig. 2(c) even for the real atom. Practically, the hyperfine dephasing effects can be suppressed for atoms in the |m F | = 0, 2 Zeeman sublevels by adjusting the optical delay τ d to match 2π/∆ D1,hfse ≈ 1.23 ns. The hyperfine effect impacts more severely the |m F | = 1 atoms through both intensity-dependent dephasing and non-adiabatic population losses. These hyperfine effects are suppressible with better Zeeman state preparations, or by faster controls with τ c ∆ D1,hfse 1 while setting τ d to be a multiple of 2π/∆ D1,hfse .
To experimentally implement the spin-wave U g (∆k) control as in Eq. (1) with ∆k = ±2k c , counterpropagating pulses are sent to the atomic sample using retro-reflection with optical delay lines (Figs. 2(a)(b)). The ability to control the sign is important for the turnoff and recall of superradiance. Going beyond Ref. [31], we discuss how to implement this using two types of optical delays. In the first type ( Fig. 2(a)), an incoming pulse and its retro-reflection by a R = 200 mm concave mirror at a L ≈ 200 mm distance outside the vacuum interact twice with the sample with a τ delay = 2L/c = 1.36 ns relative delay. This design conveniently enables the |g → |a → |g cyclic transition driven by a k c and then a −k c pulse, thereby accomplishing a U g (2k c ) operation with nearly identical pulse pairs. However, the ordered arrivals for the ±k c pulses rule out the possibility of realizing efficient U g (−2k c ) control. To reverse the time order for the ±k c pulses, we introduce a second type of optical delay as in Fig. 2 Here the delay distance L ≈ 15 m and the associated delay time τ delay ≈ 100 ns are long enough that we are able to temporally store a few pre-programmed pulses on the side of the atomic sample opposite to the incoming direction. In particular, before the preparation of the |g − |e spin waves, up to three pre-programmed D1 control pulses, such as the one marked with "p " in Fig. 2(b), are initially sent to the optical delay line. After all these pulses pass through the atomic sample, the atoms are excited and then decay into the F =1, 2 ground states within the delay time τ delay 1/Γ D1 . At the moment when these "stored" pulses are coming back, additional control pulses (such as the one marked with "p" in Fig. 2(b)) with readjusted pulse properties are sent to collide with the pre-programmed pulses to form the control sequence including spin-wave controls U g (±2k c ) in a designed order ( Fig. 1(a)). The second optical delay method is therefore more flexible for parameterizing the control pulse pairs to reversibly shift the spin-waves in k-space on demand. We notice that the first pass of the stored optical pulses causes some atom losses (to the F = 1 ground states which are dark to the following spin-wave excitations). The amount of loss is a function of the (unwanted) D1 excitation efficiency, and is therefore correlated to the pulse number, relative delays and shapes. By combining proper timing of the stored pulses with numerical modelings, the loss effects can be controlled in measurements where the atom numbers are important [50]. In addition, in this work the extra beam steering optics to fold the delay line onto the optical table introduces extra optical power loss (30%) to the retro-reflected pulses, challenging the range of intensity resilience for the control operation (also see Figs. 1 and 7). Therefore, to optimally operate the control with the second mode of optical delay, we typically readjust the pulse shaping parameters and in particular to balance the power of the pulse pairs. To precisely measure the optical delay time τ delay , a single D1 control chirped pulse is sent to the optical path to efficiently excite the atomic sample twice. The two fluorescence signals separated by τ delay are collected to precisely measure τ delay with ∼ 0.1 ns accuracy.
Additional details on the experimental measurements are given in Appendix A. With the experimental methods, we can precisely perform multiple D1 controls to shift the dipole spin-wave excitation S + (k) on the D2 transition, from the original value of k = k p to the new wave vectors k p − 2nk c (with n = 1, 2 in this work) in a reversible manner. To benchmark the control quality, we setup the photon detection path along a finely aligned k s = k p − 2k c direction to meet the |k s | = ω eg /c re-directed phase-matching condition (Fig. 1). As such, following the k p spin wave preparations a k → k − 2k c shift can re-direct the forward superradiance to k s for its background-free detection. We collect the k s -mode superradiance with a NA=0.04 objective for detection by a multi-mode fiber coupled single photon counter. To enhance the measurement accuracy, an optical filter at 780 nm is inserted to block possible fluorescence photons at λ D1 = 795 nm. Collectively enhanced spontaneous emission under k → k − 2kc spin wave controls and monitored by the re-directed emission signals. All the signals are histogrammed from gated photon counting measurements (time bin δt=256 ps), with τp = 3 ns. The probe pulse excites the system in the interval −τp < t < 0. The control pulse parameters are τc = 0.9 ns, τ d = 1.36 ns, Ω0 ≈ 2π × 2.7 GHz, δ0 = 2π × 3.4 GHz. In curve (i), the probe is implemented without D1 control. Curves (ii)-(v) show the re-directed superradiant signals collected along the ks direction with ∆t1=0.0, 4.0, 8.0, 15.0 ns as arrows indicate. In curves (vi)-(ix) the superradiance re-directed at ∆t1=0.6 ns is switched off at ∆t2 =16.6, 9.6, 5.6, 1.6 ns as arrows indicate. All curves are integrated with Nexp = 70000 measurements. The inset shows the Fourier transform of curve (i). By avoiding nearby optics and through optical filtering, the control light background is completely suppressed from the signals.

B. Dipole spin wave under control
We demonstrate multiple dipole spin wave controls with "re-direction", "switch-off", and "recall" of the associated collective spontaneous emission. Here, redirection refers to the state following the first wave vector shift to k s = k p − 2k c , where the re-directed emission along the phase-matching k s allows a direct and careful characterization of the efficiency of our k-vector shift scheme. Switch-off refers to a subsequent shift to k s = k p − 4k c , whose magnitude is strongly mismatched to radiation, leading to suppressed collective emission. Finally, recall refers to a third stage, where the spin wave k s is coherently transformed back to the phase-matched

Superradiant signal [counts]
[ns] Here curve (iv) shows the measured i ks under the same experimental conditions as those for curves (i)-(iii), but for the redirection only and with the "switch-off" and the "recall" operation removed. k s with recovered collective emission. It should be noted that the similar experimental results of the spin-wave control have been presented in the short paper [31], but here we explain additional important details and provide additional analysis.
The dipole spin wave control is first demonstrated by the "emission re-direction" as in curves (ii)−(v) of Fig. 3 at various ∆t 1 where the control pulse transformation U g as in Eq. (2) on the D1 line shifts the forward (unmonitored) k p -mode D2 emission to the k s direction. In these measurements, the pulse pair delay τ d = 1.36 ns is fixed by the setup as in Fig. 2(a). The rise time of these curves, from zero to their maximum values, is slightly broadened from τ c = 0.9 ns due to timing jitter associated with the counter. Following the rises, the signals decay on a time scale of τ ≈ 17 ns, which is more rapid than the single-atom rate of Γ D2 ≈ 1/26 ns due to the cooperatively enhanced radiation damping (see Appendix B and Ref. [50].). The f ≈ 267 MHz oscillations observed in the curves are due to a quantum beat between the D2 collective spontaneous emission from F = 3 and those from the off-resonantly excited F = 2 hyperfine levels [53]. It is worth pointing out that the "tails" of all the (ii) − (v) signal curves overlap nicely, which suggests shared collective decay dynamics for the original S + (k p ) and monitored S + (k s ) spin waves in our atomic sample (Also see Sec. III C.).
The divergence angle of the measured k s -mode emission is estimated in the far field to be at the 20 mrad level, which is consistent with the diffraction limited angle of θ ∼ 1/k p σ [49,54] of our nearly spherical atomic sample with σ ∼ 7 µm radius. We expect slight distortion of the superradiant wavefront, due to the residual dynamics phase imprinted by the intensity-imbalanced control pulse pair particularly in the first optical delay mode ( Fig. 2(a)). The dynamic phase is estimated to be at the 1 rad level across the atomic sample. To collect the re-directed emission with a single mode fiber, such wavefront distortion would need to be characterized and compensated for.
As seen in Fig. 1, we proceed further with our geometric control of spin wave vector by applying a second pair of "switch-off" control pulses along the k c direction, at various delay times ∆t 2 following the first pair of control pulses. Interestingly, following the second pair, the new wave vector k s = k p − 4k c now has an amplitude |k s | ≈ 2.9 ω eg /c that is strongly mismatched from radiation. For our dilute ensemble, this spin wave excitation should decay in a superradiant-free fashion [38], at the single-atom decay rate ∼ Γ D2 (and for an atomic array, could be highly subradiant [25]). At the same time, the emission into the same θ detection angle, without the collective enhancement, should decrease by a factor of ∼ N [14].
The fact that the second control pulse pair changes the wave vector can be directly seen in curves (vi) − (ix) of Fig. 3. In particular, following this pair, the emitted intensity along the k s direction drops to zero on a time scale of ∼ τ c , as a result of the change of spin wave vector. Indeed the expected counts from random emissions is below the noise level. Similar to previous curves, this ∼ τ c switch-off time reflects the time over which the first pulse in the pair drives population out of state |g .
Finally, to implement the k → k + 2k c "recall" operation, we modify our setup from Fig. 2(a) to Fig. 2(b), and time the incoming and stored pulses to successively perform a −2k c "re-direction", a −2k c "switch-off', and a +2k c "recall" spin wave shift. The "recall" leads to collective enhancement of the spontaneous emission again as demonstrated in Fig. 4, where the emission in the k s direction is recorded by the photon counter during the full "re-direction"-"switch-off"-"recall" sequence. Here the "switch-off" is implemented after the "re-direction" operation with ∆t 2 = 0.8 ns, which is followed by the "recall" at three different ∆t 3 . The reduced amplitudes of the recalled emission at larger ∆t 3 (Sec. II C) reflect gradual decay of the mismatched S + (k s ) spin wave order in the atomic gas, before its conversion back to the phase-matched S + (k s ) excitations. The decay lifetime is estimated to be ∼ 26 ns (≈ 1/Γ D2 ) as expected [38]. While this decay rate is expected, the fact that we can map the accumulated dynamics in the phase-mismatched state to light should have significant consequences when applied to arrays, where such dynamics has been predicted to be particularly rich [25][26][27][28][29]. In contrast, the recalled S + (k s ) superradiance decays much faster with lifetime ∼ 15 ns for this particular atomic sample.

C. Intensity and decay of the redirected emission
The k-vector shift of dipole spin waves allows us to access both the "subradiant" states with strongly phasemismatched wave vectors |k s | = ω eg /c, and the redirected superradiant states with phase-matched wave vectors |k s | = ω eg /c. A study on the dynamics of "subradiant" states is left for a future publication [50]. Here we focus on the dynamics of the directional superradiance. In particular, the redirected superradiant emission from k p to k s in our setup naturally avoids the large probe excitation background, commonly exists in previous studies of forward emissions [11,16], thereby enable us to characterize the redirected superradiance with high accuracy. With the spin-wave excitation of "timed-Dicke states" prepared in a nearly ideal way (Sec. II A), we go beyond Ref. [31] and verify the i kp ∝ N 2 scaling as in Eq. (4) for the first time to our knowledge. Furthermore, we observe possible deviation of the collective decay rate from the (1 + OD/4)Γ [14,17,18,26] related to a subtle superradiance reshaping effect [55].
We notice Eq. (4) can describe what would be observed in the re-directed emission in an ideal experiment, e.g. if the first pair of control pulses could be applied immediately (∆t 1 = 0) and perfectly after the probe pulse. To quantitatively describe the actual experiment, however, we must account for the fact that the spin wave already begins decaying in a superradiant fashion along the direction k p during the delay time ∆t 1 , that the ensemble is non-spherical and has different optical depths OD p , OD s along the directions k p,s , and that the control efficiency f d < 1 for the applied unitary operation of Eq. (2) is not perfect. The non-ideal U g control in presence of e.g. m F -dependent hyperfine phase shifts and spontaneous emission during the control, only partly converts the S + (k p ) into S + (k s ) excitation and further induces sub-wavelength density modulation in (r), thus we expect simultaneous and Bragg-scattering coupled superradiant emission into both the E s (k p ) and E s (k s ) modes. Practically for optical control with the focused laser beam as in this work, we numerically find the ground state atoms not shifted in momentum space are often associated with dynamical phase broadening, leading to sup-pressed S + (k p ) excitation and distorted sub-wavelength density fringes. Therefore the redirected superradiance into the E s (k s ) mode simply has a photon emission rate Here, we have accounted for the different optical depths along the k s,p directions, the finite control efficiency f d , the superradiant decay that already occurs during the time ∆t 1 along the original direction k p , and the fraction of atoms l that are lost during the control process.
To compare with experiments, we vary the atom number N for samples loaded into the same dipole trap with nearly identical spatial distribution. The E s (k s ) emission at a fixed delay ∆t 1 = 0.2 ns is recorded as in Fig. 5a. The time-dependent photon emission rate i s (t), obtained by normalizing the fluorescence counts with the number of runs N exp , counter time-bin δt, and an overall detection quantum efficiency Q ≈ 0.15, nicely follows exponential decay curves for the accessed N between 2 × 10 3 and 9 × 10 3 in this work. We extract both the peak emission rate i max,N and collective decay rate Γ N with exponential fits, and study both quantities as a function of atom number N .
The cooperative nature of the collective emission is clearly demonstrated in Fig 5b with the i max,N ∝ N 2 scaling since according to Eqs. (4) and (6) we have i max,N ∝ N OD s but OD s ≈ N σ r /σ 2 for our nearly spherical sample with size σ. We experimentally extrapolate the average optical depth OD s with OD x (y, z) measurements along the x− direction (Fig. 5a insets, see Appendix A for imaging details.). We have OD s = ξ × OD x with ξ ≈ 0.8 to account for the ratio of optical depth integrated along the k s and e x directions respectively. By comparing the quadratic fit that gives i max,s ≈ 4 × 10 −4 N Γ D2 OD x /4 in Fig. 5b with Eqs (4) and (6), we find a time-integrated probe Rabi frequency of θ p ≈ 2 × 10 −2 , taking our best understanding of the efficiency f d = 0.7 (See Sec. III E). This value of θ p is consistent with the expected excitation by the probe (with peak saturation parameter s ∼ 1 and duration τ p =5 ns) in these measurements [56], considering the large uncertainty in the absolute intensity parameter estimations.
We now discuss the enhanced decay rate Γ N of the collective emission, which is approximated in Eqs. (4) and (6) under the assumption of negligible angulardependent emission dynamics (Appendix B). This approximate decay rate corresponds to that of the timed Dicke state [14,17,18,26]. Similar to previous studies of forward superradiance [10,11], we find Γ N ∝ N for the redirected superradiance, as expected. Here, to make a precise comparison with the theoretical picture, we plot the same data in Fig. 5b vs the in situ measured average optical depth OD x . From Fig. 3b  with ν = 0.35 ± 0.1, with no freely adjustable parameter but with an uncertainty limited by the OD s estimation in this work. The likely discrepancy between this result and the ν = 0.25, Γ N /Γ = 1 + OD/4 prediction on the collective decay of the timed Dicke state [10,11,14] can be expected, since the measured collective emission i s (t) is integrated over the σ-limited solid angle ∼ 1/(k p σ) 2 beyond the "exact" k s = k p − 2k c phase matching condition, while the small angle scattering of E s (k s ) by the sample itself generally affects the collective emission dynamics [55], thereby violating our assumptions to reach Eq. (4). A detailed study with reduced uncertainty on the subtle effect will be for a future work [50].

D. Optical acceleration
As discussed in Sec. II A, the control pulse sequence to shift the spin waves also results in a spin-dependent kick, which optically accelerates the phase-patterned |g states by the geometric force [57]. The momentum transfer along the control beam along e z can be evaluated by integrating F z with the single-atom force opera-torF z = − 2 ∂ z Ω c |a g| + h.c., as the projected atomic state evolves on the {|g − |a } Bloch sphere ( Fig. 1(b)). For ideal population inversions, the integrated Berry curvature [58] gives the exact photon recoil momentum ∆P = 2 k c with the reduced Planck constant.
Going beyond Ref. [31], we experimentally measure the recoil momentum transfer ∆P associated with the D1 chirped pulse pair along e z for the spin wave control, using a time-of-flight absorption imaging method ( Fig. 6(b)). Keeping in mind the Doppler effects due to the acceleration affect negligibly the nanosecond control dynamics, we repeat a k → k − 2k c control pulse N rep = 5 times to enhance the measurement sensitiv-ity. The period T rep = 440 ns 1/Γ D1 is set to ensure independent interactions. We then measure the central position shift of the atomic sample after a T tof = 400 µs time of flight, using calibrated absorption images. For the absorption imaging, the atomic sample is illuminated by a τ exp = 20 µs imaging pulse resonant to 5S 1/2 F = 2−5P 3/2 F = 3 along the e x direction. The 2D transmission profile T (y, z) = I/I 0 is obtained by processing the imaging beam intensity I, I 0 on a CCD camera with and without the atomic sample respectively. We then process T (y, z) to obtain the optical depth profile OD(y, z) of the samples (Also see Appendix A). Next, the central position shifts ∆z 0 are obtained with 2D Gaussian fits for samples with and without the control pulses. Finally, a velocity change is estimated as ∆v = ∆z 0 /(T tof +τ exp /2). Typical absorption images are presented in Fig. 6(a) for samples under D1 control pulses with various Ω 0 and δ 0 parameters as defined in Sec. III A. In particular, we expect that the parameters used in image D in Fig. 6(a) lie in an error-resilient region of parameter space, and reflect the nearly ideal momentum change of ∆P = 2 k c . We shortly show that the measured momentum kicks agree well with numerical models.
With the knowledge of recoil velocity v r = k c /m ≈ 5.8 mm/s, we obtain ∆P = ∆v/N rep v r × k c per D1 control. Typical ∆P measurements are presented in Fig. 7(a) vs intensity parameter √ s, for shaped pulses with different chirping parameters δ 0 . For controls with nearly zero chirp (δ 0 = 2π × 0.1, 1.0 GHz), ∆P displays a damped oscillation, which is due to optical Rabi oscillations with broadened periodicity associated with intensity inhomogeneity of the focused laser. The oscillation is suppressed at large δ 0 , with ∆P reaching 89(4)% of the ∼ 2 k c limit at large s, suggesting a robustness to our coherent control process. Similar measurements are performed for the reversed k → k + 2k c controls which results in opposite momentum shifts.

E. Control efficiency: calibration and optimization
To quantify the imperfections in implementing the spin-wave control by geometric phase patterning (U g in Eq. (2)), we need to properly model the dissipative dynamics of collective dipoles. For this purpose, we introduce the coherent dipole control efficiency, f d = tr(ρ η S + (k s )S − (k s )) η /tr(ρ 0 S + (k s )S − (k s )), with ρ η , ρ 0 the density matrix that describes the weakly D2 excited atomic sample subjected to the non-idealŨ g and the ideal, instantaneous U g (∆k) control by Eq. (2) respectively. HereŨ g (∆k; Ω 0 , δ 0 , η) due to the nanosecond shaped pulse control is parameterized by the peak Rabi frequency Ω 0 and chirping parameter δ 0 , as well as factors η 1,2 as the normalized laser intensities for the forward and retro-reflected pulses locally seen by the atoms.
The control efficiency f d is averaged over the Gaussian intensity distribution.
To optimize the k → k − 2k c shift by the non-ideal The simulation also provides acceleration and dipole control efficiency for an atom starting in the Zeeman state mF = 0 (dashed lines labeled with (i) in the legends). In (a) the "A" to "D" markers give parameters for absorption images presented in Fig. 6.
U g (2k c ; Ω 0 , δ 0 , η) control, experimentally we simply scan the control pulse shaping parameters √ s ∝ Ω 0 and δ 0 to maximize the re-directed superradiant emission that generate the time-dependent signal such as the curve (ii) in Fig. 3. The data in Fig. 7(c) are the corresponding total photon counts by integrating the time-dependent signals. By optimizing the total counts, we are able to locate the optimal pulse shaping parameters Ω 0 = 2π × 2.7 GHz and δ 0 = 2π × 3.4 GHz for the τ c = 0.9 ns chirped-sine pulses in these experiments.
The magnitude of the optimally re-directed superradiant emission scales quadratically with total atom number N and increases with both the probe excitation strength |Ω p τ p | 2 1 and the control efficiency f d (see Eqs. (4) and (6)). However, without accurate knowledge of the experimental parameters associated with the spin-wave preparation and emission detection, it is difficult to precisely quantify f d with the photon counting readouts. Instead, we calibrate the spin-wave control efficiency f d with a numerical modeling strategy. In particular, we perform density matrix calculations of a single atom interacting with the control pulses, including full hyperfine structure. Both the collective spin wave shifts and matter-wave acceleration can be evaluated from the single-atom results, if atom-atom interactions and rescattering of the control fields is negligible, as we expect to be the case for the low atomic densities and high pulse bandwidths used in our work. For the calibration, we first adjust experimental parameters in numerical simulations so as to optimally match the simulated average momentum shift ∆P η in Fig. 7(b) with the absolute experimental measurements in Fig. 7(a). The corresponding f d under identical experimental conditions are then calculated as in Fig. 7(d). The fairly nice match between the superradiance measurements in Fig. 7(c) and Fig. 7(d) is achieved by uniformly normalizing the total counts in Fig. 7(c), with no additionally adjusted parameters. Near the optimal control regime, the simulation suggests we have reached a collective dipole control efficiency f d = 72 ± 4%, accompanied with the observed f a ≡ ∆P/2 k c = 89 ± 4% acceleration efficiency. Constrained by the absolute acceleration measurements, we found this optimal f d estimation to be quite robust in numerical modeling when small pulse shaping imperfections are introduced.
On the other hand, for a full "re-direction"-"switchoff"-"recall" sequence as in Fig. 4, we can also estimate the efficiency of the k → k + 2k c "recall". We fit the amplitude of the "recalled" superradiant emission in the short ∆t 3 limit, and compare that with the amplitude of the "redirected" emission with the same experimental sequence (see the supplementary material of Ref. [31]). The ratio between the two fluorescence signal amplitudes defines an overall storage-recall efficiency for the controlled dipole spin wave intensity of ∼ 58%. By assuming equal efficiency for each of the two operations, the efficiency for a single k → k±2k c shift is thus at the 75(5)% (∼ √ 58% ) level, which are performed using the second type of delay line ( Fig. 2(b)) and re-optimized pulse shaping parameters. The efficiency is also consistent with the prediction by numerical modelings, as discussed in Appendix B.
The optimal f d as in Fig. 7(c) is limited by m Fdependent hyperfine phase shifts and D1+D2 spontaneous decays during the τ d + τ c = 2.26 ns control. In particular, a l ∼ 10% atom loss due to D1 spontaneous emission and 5P 1/2 population trapping (particularly for |m F | = 1 states) is expected to reduce the number of atoms participating in the D2 collective emission. With atoms prepared in a single m F = 0 state, spontaneous emission limited dipole control efficiency of f d ≈ 87%, accompanied with an acceleration efficiency f a ≈ 97% should to be reachable [Figs. 7(b)(d)] with the same control pulses.

IV. DISCUSSIONS
The error-resilient state-dependent phase patterning technique demonstrated in this work is a general method to precisely control dipole spin waves in atomic gases and the associated highly directional collective spontaneous emission in the time domain [38,[59][60][61]. The control is itself a single-body technique, which can be accurately modeled for dilute atomic gases when the competing resonant dipole-dipole interactions between atoms can be ignored during the pulse duration. With the geometric phase inherited from the optical phases of the control laser beams, it is straightforward to design ϕ G beyond the linear phase used in this work and to manipulate the collective spin excitation in complex ways tailored by the control beam wavefronts.
We note that the atomic motion associated with the control can limit the coherence time of the spin wave order in free gases at finite temperature, and that the limiting effect can be suppressed with optical lattice confinements [31]. In the following we discuss methods for perfecting the spin-wave control in dense gases, and then summarize possible prospects opened by this work.

A. Toward perfect control with pulse shaping
The optical dipole spin wave control (i.e., to implement U g in Eq. (2)) in this work is subjected to various imperfections at the single-body level. The pulse shaping errors combined with laser intensity variations lead to imperfect population inversions and reduced operation fidelity. The imbalanced beam pair intensities lead to spatially dependent residual dynamical phase writing and distortion of the collective emission mode profiles. The hyperfine coupling of the electronically excited states may lead to inhomogeneous phase broadening as well as hyperfine Raman couplings, resulting in coherence and population losses as in this work. Finally, the spontaneous decays on both the D1 control and D2 probe channels limit the efficiency of the finite-duration pulse control. However, the imperfections of the control stemming from the singleatom effects are generally manageable with better quantum control techniques [62][63][64][65] well-developed in other fields, if they can be implemented in the optical domain with reliable pulse shaping system of sufficient precision, bandwidth and output power.
Beyond single-body effects, we emphasize that with the increased Ω c strength and reduced τ c,d time, it is generally possible to suppress interaction effects so as to maintain the precision enabled by the single-body simplicity, for precise spin-wave control in denser atomic gases.
As discussed in Sec. III E, the pulse shaping system used in this work already supports f d ∼ 87% efficiency if atomic m F states are better prepared, which is then limited by the D1 and D2 spontaneous decay (single-atom limit) during the τ c + τ d = 2.26 ns control time. Instead of imparting geometric phase to the ground state atoms, in future work an |e − |a transition with a longer |a lifetime [66] may be chosen to implement a U e (ϕ G ) for |e -state phase-patterning. The influence from the D1 decay can thus be eliminated, leading to f d ∼ 95% limited by the suppressed D2 decay. With an additional 5-fold reduction of τ c to enable τ c + τ d to ∼ 400 ps, aided by the well-developed advanced error-resilient techniques, we expect f d reaching 99% for high fidelity dipole spin wave control.
For the error-resilient shaped optical pulse control, ideally the 5-fold reduction of control time from the τ c = 0.9 ns pulses in this work needs to be supported by a 5-fold increase of laser modulation bandwidth and a 25-fold increase of laser intensity. Starting from the sub-nanosecond pulse shaping technique in this work detailed in Ref. [31], the improvement may be achieved with a combined effort of stronger input, wider modulation and tighter laser focus. As a promising alternative, the control pulses may also be generated with mode-locked lasers [35,36,[67][68][69][70] with orders of magnitudes enhanced peak power and pulse bandwidth. Here we notice that for a same control operation, the required pulse peak power and energy scales with 1/τ 2 c and 1/τ c respectively. For controlling macroscopic samples as in this work, the scaling toward ultrafast pulses can become demanding enough to require sophisticated optical pulse amplifications, compromising the setup flexibility. In addition, the control strength Ω c 1/τ c and the modulation bandwidth are also upper-bounded too to avoid uncontrolled light shifts and multi-photon excitations. Therefore, for the purpose of precisely and flexibly control of dipole spin waves with mode-locked lasers, it appears shaping picosecond pulses are more preferred than shaping ultrafast pulses [35,67,71,72] for generating nearly resonant pulses with a suitable duration and modulation bandwidth.

B. Summary and outlook
In this work we experimentally demonstrate and systematically study a state-dependent geometric phase patterning technique for control of collective spontaneous emission by precisely shifting the k-vector of dipole spin waves in the time domain. The method involves precisely imparting geometric phases to electric dipoles in a large sample, using a focused laser beam with large intensity inhomogeneities. Similar error-resilient techniques have been widely applied in nuclear magnetic resonance researches [45,64,65,73]. Our work represents a first step of exploring such error-resilience toward optical control of dipole spin waves near the unitary limit, and for efficient far-field access to the rarely explored phase-mismatched optical spin-wave states. During the characterization of our method, we also made intriguing and first-time observations related to fundamental properties of spin wave excitations. These include a verification of the i N ∝ N 2 scaling law, a qualification of the enhancement relation Γ N /Γ = 1+OD/4, and an observation of matter-wave acceleration accompanying the spin wave control. We have provided a first theoretical analysis of this spin wave and spontaneous emission control. Instead of working with free-space and randomly positioned atoms, our control technique can be readily applied to atomic arrays for efficient access to highly subradiant states. The technique may open the door to related applications envisaged in the field of quantum optics [38,74], to help unlock nontrivial physics of long-lived and interacting dipole spinwaves in dense atomic gases [25-30, 51, 75-78], and to enable nonlinear quantum optics based on subradianceassisted resonant dipole interaction [79][80][81].
Finally, on the laser technology side, we hope this work motivates additional developments of continuous and ultrafast pulse shaping methods for optimal quantum control of optical electric dipoles. The absorption imaging setup as schematically illustrated in Fig. 6 not only helps us to quantify the optical acceleration effect with TOF technique, but also to directly measure the optical depth profile OD x (y, z) and atom number N as in Sec. III C. To investigate the Γ N /Γ = 1 + OD/4 relation, extra care was taken to extract the OD x (y, z) images from the resonant absorption images. Here OD x (y, z) to be measured should be the unpolarized atoms in the weak excitation limit, with in situ (r) distribution close to those in the quantum optics experiments and for both low OD < 1 and quite high OD ∼ 3.5. To ensure consistent (r) distribution to be measured, a short exposure time of 20 µs is chosen. To collect sufficient counts on the camera, we use imaging beams with quite high intensity in the range of I 0 = 1 ∼ 20 mW/cm 2 and thus with a saturation parameter s = 0.3 ∼ 7 assuming I s = 3.05 mW/cm 2 [52] for π transition of 5S 1/2 F = 2 − 5P 3/2 F = 3. We reduce the measurement uncertainty related to saturation effects following techniques similar to Refs. [82,83]. In addition, to avoid measurement uncertainty related to low local counts for the highly absorbing samples, we calibrate the peak OD of the in situ samples with time-of-flight (TOF) images at reduced OD. The processes are detailed as following.
We start by repeated absorption imaging measurements for nearly identical TOF samples with 2D transmission profile T (I) = I/I 0 > 75%, with incoming I 0 (y, z) and transmitted I(y, z) intensities recorded on the camera. The optical depth profile in the weak excitation limit can be approximately as OD x (y, z) = −logT (I) + (I 0 − I)/I eff s [82,83]. Here I eff s is an effective parameter for calibrating our saturation intensity measurements. By globally adjusting I eff s and thus the (I 0 − I)/I eff s term, we obtain consistent OD x (y, z) from all the measurements with I 0 = 1 ∼ 20 mW/cm 2 with minimal variations. Notice that the radiation pressure during the imaging process does not significantly vary the power-broadened atomic response.
The optimally adjusted I eff s serves to extract the OD x (y, z) spatial profile for atomic sample immediately after their release from the dipole trap, as in Fig. 5a with approximately identical spatial profiles. In addition, under the consistent atomic sample preparation conditions we also measure the optical depth profile OD x (y, z) and total atom number after a 430 µs time-of-flight. The time-of-flight greatly reduces the peak linear absorption for the highest OD sample here from the expected 95% ∼ 99% level down to 15% ∼ 25%, leading to more accurate estimation of integrated OD that is served to calibrate the in situ OD x measurements. To account for optical pumping effects that tend to increase the F = 2 − F = 3 light-atom coupling strengths, a factor of 0.85 [52] is multiplied to the extracted OD x (y, z).
We finally adjust OD x due to the imaging laser frequency noise in this work by up to 30%, according to the measured linewidth broadening of the TOF sample absorption spectrum, and then obtain OD s using the sample aspect ratio estimated by the auxiliary imaging optics along e z . These last two steps introduce the largest uncertainties into our OD s estimation. It is worth noting that the laser noise correction tends to reduce the ν-value in Sec. III C. We use σ r = 1.59 × 10 −9 cm 2 for linearly polarized probe on 5S 1/2 , F = 2 levels to estimate N = 1 σr OD x (y, z)dydz.

Appendix B: Theoretical model and numerical simulation
This Appendix provides theoretical background associated with the experimental observations for both this work and Ref. [31]. First, in Sec. B 1 we provide a minimal model to explain the collective spontaneous emission by the controlled spin waves. Next, in light of short control time τ c,d with negligible atom-atom interaction effects, we set up a "single-atom" model to explain the control of spin waves supported by the nearly non-interacting atoms in Sec. A 2-4. Different from Ref. [31], here we outline the method to model the interaction between the atomic sample with the focused laser beam by estimating the most likely experimental parameters, so as to clearly understand the physical limitations behind the inefficiency of our spin-wave control. Finally, in Sec. A5 we discuss the influence of controlled spin-wave dynamics subject to hyperfine interactions during the superradiant "recall" operation.

Collective spontaneous emission from a dilute gas of two-level atoms
We consider the interaction between N two-level atoms with a resonant electro-magnetic field at wavelength λ p and frequency ω eg . With transition matrix element d eg = d eg e d , the absorption cross-section is given by σ r = k p α i with α i = 2|d eg | 2 / Γ, Γ being the linewidth of the |e − |g transition. The atomic ensemble follows an average spatial density distribution (r) = i δ(r − r i ) that is assumed to be nearly spherical and smooth, in particular, (r) does not vary substantially on length scales other than that close to its characteristic radius σ λ p . We further restrict our discussion to intermediate sample size with σ cτ , with c the speed of light and τ the shortest time-scale of interest. The transmission of a plane-wave resonant probe beam at the exit of the atomic sample, in the r = {r ⊥ , r p } coordinate, follows the Beer-Lambert law with transmission T (r ⊥ ) = e −OD(r ⊥ ) . The 2D optical depth distribution is given by OD = N c (r ⊥ )σ r , c (r ⊥ ) = 1 N (r)dr p being the normalized column density.
To describe both the collective dipole dynamics and its collective radiation, we regard the small atomic sample as system and free-space optical modes as reservoir. The electric-dipole interaction can be effectively described by the many-atom density matrix ρ, after the photon degrees of freedom are eliminated by the standard Wigner-Weisskopf procedure. Following the general approach [25,54] the density matrix ρ obeys the master equationρ = 1 , where L c is the "population recycling" super-operator associated with random quantum jumps in the stochastic wavefunction picture. Here we focus on the effective Hamiltonian H eff that governs the deterministic evolution of states and observables. The non-Hermitian effective Hamiltonian can be expressed as with single atom Hamiltonian H i a for atom at location r i , and effective dipole-dipole interaction operatorV DD,eff = i,jV i,j DD that sums over the pairwise resonant dipole interaction Here σ + i = |e i g i |, σ − i = (σ + i ) † are the raising and lowering operators for the i th atom and ε 0 is the vacuum permittivity. G(r, ω eg ) is the free-space Green's tensor of the electric field obeying Intuitively, Eq. (B2) allows for the exchange of excitations between atoms, which is mediated by photon emission and re-absorption, and whose amplitude thus naturally depends on G(r, ω eg ) which describes how light propagates from one atomic position to another. With the spin model description of the atomic dipole degrees of freedom, the electric field operator, describing the light emitted by the atoms, can be written in terms of the atomic properties as: Instead of generally discussing evolution of atomic states in the N −spin space governed by H eff , in the following we discuss the timed-Dicke state |ψ TD (k) = S + (k)|g 1 , g 2 , · · · , g N and observables composed of collective linear operators. The results can then be straightforwardly applied to weakly excited gases in the linear optics regime as in this experiment.
We first consider the field amplitude of the spontaneously emitted photons. The emitted single photon from a timed Dicke state has a spatial mode profile ε k (r) = g 1 , g 2 , ..., g N |Ê s (r))|ψ TD (k) , which is readily re-written after the {r i }−configuration average as Writing the spatial coordinate as r = {r ⊥ , r k }, one can first integrate Eq. (B5) at a fixed perpendicular coordinate over r p , to obtain the emitted field at the end of the sample as in Eq. (3) with δk = |k| − ω eg /c. The approximate integration assumes slowly-varying amplitude along both r ⊥ and r k directions. For k = k p with δk = 0, we then integrate the corresponding intensity over all transverse positions r ⊥ , and normalize the radiation power by the energy ω eg of a single photon to obtain the the collective photon emission rate i (1) kp = 2ε0c ωeg |ε p (r)| 2 d 2 r ⊥ ≈ OD p Γ/4. For weakly excited coherent spin wave excitation, this emission rate is multiplied by N θ 2 p as in Eq. (4).
We now discuss time-dependence of collective spontaneous emission described by Eq. (4) in the main text. The topic is related to collective Lamb shift in a dilute atomic gas, an important and quite subtle effect well studied in previous work [17,84]. In order to apply the general theoretical predictions to this work, we explore the spin model [25] to revisit the decay part of the problem, for the quite dense and small samples here.
(B6) To evaluate Eq. (B6), we insert the orthogonal timed-Dicke basis {|ψ TD (k p ) , |ψ 1 (k p ) , ..., |ψ N −1 (k p ) } as in Ref. [17] into the equation. Here |ψ n (k p ) = S + n (k p )|g 1 , ..., g N are single-excitation collective states with S + n (k p ) = i c n,i σ + i , n = 1, ..., N − 1 and with c n,i properly chosen to ensure the basis orthogonality [17]. We further define the far-field emission amplitudes associated with the N − 1 |ψ n (k p ) states as ε n (r, t) = g 1 , g 2 , ..., g N |Ê s (r, t))|ψ n (k p . We have, with V DD (k p , k p ) = ψ TD (k p )|V DD,eff |ψ TD (k p ) and similarly V DD (n, k p ) = ψ n (k p )|V DD,eff |ψ TD (k p ) . The 2nd line of Eq. (B7) includes random and collective couplings between the k p superradiant excitation and other super-and sub-radiant modes [85], a fact associated with |ψ TD (k p ) not being the eigenstate ofV DD,eff [17,84,86]. The V DD (k p , k p ) ∝ i,j d * eg · G(r i − r j , ω eg ) · d eg e ikp·(ri−rj ) factor in the first line of Eq. (B7) is carefully evaluated as following: For i = j we have divergent d * eg · G(0 + , ω eg ) · d eg whose real part accounts for single-atom Lamb shift and is absorbed into a redefinition of ω eg , with imaginary part equal to Γ/2 for isolated 2-level atoms. The i = j part is evaluated after the d * eg · G(r − r ) · d eg (r) (r )e ikp·(r−r ) d 3 rd 3 r . Following the same integration trick to arrive at Eq. (3), we rewrite this integration into the form of V ∝ ε p (r) (r) to have B8) with the normalized column density c (r ⊥ ) = (r ⊥ , r p )dr p and with OD p = N σ r c (r ⊥ ) 2 d 2 r ⊥ , as in the main text. We finally have To obtain the simple expression of V in Eq. (B8) and V DD (k p , k p ) in Eq. (B9), the SVE and Raman-Nath approximations are applied to evaluate ε p inside the sample. The approximations lead to field error of order λ p /σ or higher. The corrections of these errors are associated with density dependent corrections to Eq. (B9) including the collective Lamb shifts [17].
We come back to Eq. (B7). For the smooth density distribution at moderate densities under consideration, the inter-mode couplings V DD (n, k p ) are generally expected to be quite weak and {r i }−specific. For the {r i }−averaged fields, at an observation location r o in the far field along the k p direction, the couplings can be completely ignored initially, since with we have ε p (r o , 0) ∝ (r)d 3 r while ε n (r o , 0) = 0. We consider ε p = ε p + δε p , ε n = ε n + δε n , V DD = V DD + δV DD , and apply the {r i }configuration average to Eq. (B7). By ignoring the ε n terms, we obtain the initial decay of ε p (r o , t) aṡ Equations (B9)(B10) suggest superradiant decay of directional spontaneous emission power at the Γ N = (1 + OD p /4)Γ rate on the exact forward (k p ) direction, for atomic samples at moderate densities (N < k 3 p σ 3 ). Apart from predicting the decay rate of the farfield emission, it is worth pointing out that the Γ N = 2Im ψ TD (k p )|H eff |ψ TD (k p ) associated with Eq. (B9) is also applicable to the decay of |ψ TD (k p ) population in the Schrödinger picture [10,11,14,17,84,86,87], and by energy conservation the initial rate of photon emission into 4π. In this work we further approximately identify this decay rate with that for the observable i kp (t) ∝ d 2 r ⊥ |ε p (r ⊥ , t)| 2 , leading to Eq. (4) in the main text for the collective emission. The same conclusion can be reached if one simply assumes the spatial profile ε p (r, t) would not change signifiantly during the emission, so as to ignore the V DD (n, k p ) couplings. However, it is important to note that for ε n in Eq. (B7) associated with collective emission near the forward directions (close to k p ), the V DD (n, k p ) couplings can also be collective, and may strongly affect ε p (r, t) dynamics at r along similar directions. Such couplings are just small angle diffraction by the averaged sample profile, that generally lead to reshaped emission wavefronts ε p (r, t) over time [55], and, as a consequence, deviation of i kp (t) decay rate from that for the |ψ TD (k p ) population. The last term in Eq. (B10) is associated with granularity of the atomic distribution, and we also expect that such granularity cannot be ignored for very high densities, or for systems with broken symmetry such as in a lattice.
In future work [50], it would be interesting to understand better the effect of discreteness on collective inter-actions, and in addition to investigate further the corrections due to the intermode coupling in Eq. (B7) and the possible deviation from the dynamics of Eq. (4).
We remark that in all the discussions in this work, the replacement |d eg | 2 = Γα i /2 is general and applicable to atoms with level degeneracy. Thus we expect the conclusions for Eqs. (3)(4)(5)(6) in the main text applicable to the D2 line of 87 Rb atom in this work.

Simulation of spin-wave dynamics supported by non-interacting atoms
We now ignore the atom-atom interaction in Eq. (B1), and write down the effective non-Hermitian Hamiltonian for the interaction-free model of N 3-level atoms as Here, r i is the spatial position of the i th atom and we have σ + c,i = |a i g i | and σ + i = |e i g i |. The control Rabi frequency is η(r i )Ω c (t) = |E c (r i , t) · d ag |/ , with a Gaussian beam intensity profile. Ω c (t) is the spatially peak value of the control Rabi frequency and η(r i ) ≤ 1 is a position dependent factor. Ω p is the Rabi frequency of the probe pulse with 0 −τp Ω p dt 1 as defined before. ϕ p (r i ) = −k p · r i is the optical phase of the probe pulse. Ω c (t) andφ c (r i , t) = δ c (r i , t) are depicted in Fig. 1(a) in the main text for robust statedependent phase-patterning. By changing the basis into k-space, we can rewrite the Hamiltonian in Eq. (B11) as g,e,k 2 Ω p (t + τ p )c y eg |e, k + k p g, k| + h.c. , g,a,k 1 2 η 1 Ω c1 (t − t 1 )c x ag |a, k + k c g, k| + h.c. , where |g, k = 1 √ N i e ik·ri |g i and similarly for |e, k and |a, k . Here we have included all the D1 and D2 hyperfine levels and use {g, e, a} as indices to label the {5S 1/2 , 5P 3/2 , 5P 1/2 } hyperfine levels respectively. The c x ag ,c y eg are coupling coefficients for e x − and e y − polarized D1 and D2 pulses, derived from the Clebsch-Gorden coefficients. Here η 1,2 are factors to account for the laser intensity inhomogeneities. Following the convention as in Fig. 1(a), the probe excitation is between −τ p < t < 0 which is followed by the two D1 control pulses (H c1 and H c2 ) starting at t 1 = ∆t 1 and t 2 = ∆t 1 + τ d , respectively. We finally end up with the master equation for the single-atom density matrix ρ (s) aṡ The collapse operators are simply defined aŝ with "j" running through "x","y" and "z" polarizations.
With ρ (s) (t) it is straightforward to calculate the interaction-free evolution of the many-atom density matrix ρ(t) = (ρ (s) (t)) ⊗N and to evaluate collective observables Ô = tr(ρ(t)Ô) [31]. By solving the master equation with the initial condition ρ (s) = 1 5 5 g=1 |g, k g, k| (with |g running through the |F = 2, m F Zeeman sublevels), we further calculate the dipole coherence d(k s ) = tr ρ (s) (t)d − (k s ) , and similarly for d(k p ) and d(k s ).
Here we define the operators d − (k s ) = e y g,e c y eg |g, k + 2k c e, k + k p |, d − (k p ) = e y g,e c y eg |g, k e, k + k p | and d − (k s ) = e y g,e c y eg |g, k + 4k c e, k + k p |. The superradiant signal i ks in the main text is related to the expectation value S + (k s )S − (k s ) . In the large N limit, we approximately have Thus, by calculating the dipole coherence, the simulation can reproduce the D2 collective emission dynamics with D1 control for non-interacting atoms. With experimental imperfections encoded in parameters like η 1,2 in Eq. (B14), we refer to the numerically evaluated single-atom density matrix according to Eq. (B15) as ρ (s) η (t). For comparison, the perfect geometric phase patterning is implemented by replacing the evolution by H c1 + H c2 in Eq. (B14) with instantaneous U g (−2k c ) = 1 − g |g, k g, k| + g |g, k + 2k c g, k|, leading to a "perfectly controlled" density matrix ρ    Dash arrows represent the "effective" quantum jump operations associated with Eq. (B16). The double-sided arrows represent the coherent laser couplings. The coherence between the wavy underlined lattice sites |e, k + kp and |g, k + 2kc is associated with the re-directed superradiant emission.  Fig. (a) is according to Eq. (B14) with a τp = 3 ns D2 probe excitation followed by a k → k − 2kc control composed of two chirped D1 pulses with τc = 0.5 ns and τ d = 1.24 ns at ∆t1 = 0.2 ns. The kp (corresponds to forward radiation) and ks = kp − 2kc (corresponds to the redirected radiation) spin-wave components are plotted with dashed and solid lines respectively. The dashdotted line corresponds to the mismatched k s = kp − 4kc excitation. In (b) an additional k → k − 2kc is applied with ∆t2 = 4.8 ns. In (c), a k → k + 2kc "recall" is simulated at ∆t3 = 5.5 ns. The recall efficiency similar to those in Fig. 4 is recovered. The ∼ 267 MHz quantum beat amplitude needs to be taken into account when comparing the spin wave amplitudes and to estimate the control efficiency. Notice the difference of τp and thus a different contrast to the quantum beat comparing with measurements in Fig. 4.

f d and fa estimation
We relate the experimental observable i ks (t) with ensemble-averaged i s,η (t) as i s,η η , and calculate collective dipole control efficiency as f d = i s,η (τ c + τ d ) η /i s,0 (0). The ensemble average of emission intensity, instead of field amplitude, is in light of the fact that we experimentally collect i ks (t) with a multi-mode fiber, and the signal i ks (t) is insensitive to slight distortion of the E s -mode profile by the dynamic phase writing due to the imbalanced η 1,2 .
The simulation of optical acceleration by the D1 control pulses follows the same Eqs. (B14)(B15), but without the probe excitation and with atomic levels restricted to the D1 line only. We evaluate the momentum transfer as ∆P η = k c g,n n g, k + nk c |ρ (s) η (t)|g, k + nk c + a,n n a, k + nk c |ρ (s) η (t)|a, k + nk c for t = τ c + τ d . We then compare the ensemble-averaged acceleration efficiency f a = ∆P η η /(2 k c ) with the experimental measurements.
The η 1,2 average in both calculations is according to spatial distribution of control laser beam intensity profile seen by the atomic sample. As the final results are quite insensitive to distribution details, we assume both the laser beam and the atomic sample have Gaussian profiles, with waists w = 13 µm and σ = 7 µm by fitting the imaging measurements and with optics simulations. We adjust the retro-reflected beam waist w r and the intensity factor η 2 ∝ 1/w r accordingly in the simulation, together with an overall intensity calibration factor κ multiplied to the s parameter from the beat-note measurements of the control pulses [31]. The ensemble averaged f a is compared with experimentally measured ∆P/2 k c , and we adjust κ, w r to globally match the single-atom simulation with all the measurement results for optical acceleration as in Fig. 7. We then estimate both f a ,f d as discussed in Sec. III E. Since the second type of delay line (Fig. 2(b)) involves more optics than the first one ( Fig. 2(a)), we expect more power loss and wavefront distortion for the retro-reflected pulses. However, with the second-type delay line, we are able to experimentally adjust the the amplitude of the pre-programmed pulses (marked with "p " in Fig. 2(b)) to approximately re-balance the intensity of the counter-propagating control pulses despite the power loss. In the corresponding simulation, we accordingly readjust the beam waist w r for the retro-reflection together with the intensity factor η 2 ∝ α/w r . Within reasonable adjustments, the simulation results (see Fig. 9) suggest that the efficiency f d of the single k-shift operation reaches 75%, agreeing with the estimation based on the retrieval efficiency [31].  We scan ∆t3 with all the other parameters in the full "re-direction"-"switch-off"-"recall" sequence unchanged and numerically calculate the relative "recall" efficiency. Blue dashed line: reference of single decay dynamics. (c) The Fourier transform of the ∆t3-dependent "recall" efficiency.

Reconstructing the controlled spin wave dynamics
With the simulation parameters optimally matching the experiment, we further simulate the experimental sequence in Fig. 3 and calculate |d(k)| 2 = |tr ρ (s) η (t)d(k) | 2 η associated with collective dipole excitation with k = {k p , k s = k p − 2k c , k s = k p − 4k c } for the forward, re-directed, and "subradiantly stored" collective radiation respectively. We not only reproduce features of the experimental observable i ks (t) ∝ |d(k s )| 2 , but also unveil time-dependent dynamics for the unmonitored forward emission |d(k p )| 2 and the "subradiantly stored" or the superradiance-free excitation |d(k s )| 2 . Typical results are given in Fig. 9. 5. Consistent "recall" at selected ∆t3 delay The "redirection"-"switch-off"-"recall" sequence as in Fig. 4 involves multiple spin-wave controls U g (±2k c ) implemented with cyclic D1 transition, driven by pairs of counter-propagating control pulses. The controls are not perfect. In particular, after each pair of pulsed control, there is residual D1 population left in the 5P 1/2 state with dipole coherence determined by the optical phases as well as details of the control dynamics. For two successive pulses of the same type, the coherently added residual D1 coherences may be enhanced or canceled, depending on the relative phase between the two. Such interference effects are actually explored for robust quantum control with composite pulse techniques [65,88].
In this experiment the interference effect emerges in particular between the second pulse of the "switch-off" control and the first pulse of the "recall" control, both driven by a −k c chirped pulse ( Fig. 10(a)). The relative phase between the dipole coherence evolves according to the laser-detuning to the dipole transitions. Here, with the center frequency of the control pulse at the midpoint of the hyperfine transitions 5S 1/2 F = 2 -5P 1/2 F = 1, 2, the interference leads to oscillation of the control efficiency, which is expected at the frequency ∆ D1,hfse /4π = 814.5/2 MHz (∆ D1,hfse is the 5P 1/2 hyperfine splitting.). The oscillations have been observed experimentally. To confirm the picture, we did a single-body simulation including all the hyperfine levels in theD1and D2 line of 87 Rb as modeled in Sec. B 2, by setting the Hamiltonian H c1 + H c2 to match the sequence in the Fig. 1(a) of the main text and furthermore with control parameters consistent with the experimental condition. All the parameters are fixed except the "recall" delay ∆t 3 , which is scanned in simulations to numerically evaluate a relative "recall" efficiency define as f recall ∝ |d(k s )| 2 t=3τc+3τ d +∆t1+∆t2+∆t3 . The simulation results are shown in Fig. 10. As expected, the "recall" efficiency oscillates with ∆t 3 with ∆ D1,hfse /2 frequency. In addition, it should be noted that there is oscillation at the frequency ∼ 6.8 GHz with much smaller amplitude ( Fig. 10(b)), which is associated with the hyperfine splitting of the ground state. This additional modulation of the recall efficiency is because that the dipole coherence in the hyperfine transitions 5S 1/2 F = 1 -5P 1/2 F = 1, 2 is also excited during the control, though inefficiently since the control pulse is far-detuned, and influence negligibly our measurements.
To make sure that the residual dipole coherence interference effect is consistent for all the "recall" delays, we carefully select the "recall" delay time to be ∆t 3 = 4πm/∆ D1,hfse + t off , with integers m and a constant offset t off . This recall time selection makes it possible for us to study the decay dynamics of the phasemismatched spin waves with suppressed systematic error induced by the imperfect D1 control.