Coherent Dynamics in Quantum Emitters under Dichromatic Excitation

We characterize the coherent dynamics of a two-level quantum emitter driven by a pair of symmetrically-detuned phase-locked pulses. The promise of dichromatic excitation is to spectrally isolate the excitation laser from the quantum emission, enabling background-free photon extraction from the emitter. Paradoxically, we find that excitation is not possible without spectral overlap between the exciting pulse and the quantum emitter transition for ideal two-level systems due to cancellation of the accumulated pulse area. However, any additional interactions that interfere with cancellation of the accumulated pulse area may lead to a finite stationary population inversion. Our spectroscopic results of a solid-state two-level system show that while coupling to lattice vibrations helps to improve the inversion efficiency up to 50\% under symmetric driving, coherent population control and a larger amount of inversion are possible using asymmetric dichromatic excitation, which we achieve by adjusting the ratio of the intensities between the red and blue-detuned pulses. Our measured results, supported by simulations using a real-time path-integral method, offer a new perspective towards realising efficient, background-free photon generation and extraction.

Solid-state quantum emitters, in particular semiconductor quantum dots (QD), offer a promising platform for generating quantum states that can facilitate dephasingfree information transfer between nodes within an optical quantum network [1][2][3][4]. On-demand indistinguishable photon streams for this purpose can be made using coherent excitation of QDs [5]. While resonance fluorescence of QD suppresses detrimental environmental charge noise [6] and timing jitter [7] in the photon emission, the excitation laser must be filtered from the single photon stream. Typically, this is achieved with polarization filtering of the resonant laser. However, unless employing a special microcavity design [8][9][10], polarization filtering inherently reduces collection efficiency by at least 50%. This motivates the consideration of alternative off-resonant excitation techniques to allow spectral filtering [11]. In particular, off-resonant phonon-assisted excitation [12,13], resonant two-photon excitation [14] and resonant Raman excitation [15] schemes benefit from being able to spectrally isolate the zero-phonon line from the laser spectrum, to enable efficient single photon generation.
The argument is that this would make it possible to efficiently excite quantum emitters using pulses that are spectrally separated from the fundamental transition. However, a different picture unfolds when the system 2 dynamics is considered in more detail: the Hamiltonian of an ideal two-level system (2LS) driven by a dichromatic pulse f (t) = R (t)e −i∆t + B (t)e i∆t with real envelopes of the red-and blue-detuned pulses R (t) and B (t) (in the rotating frame with respect to the 2LS) is given by where we have expressed the two-level state as a pseudospin Bloch vector s precessing about the time-dependent precession axis Ω(t).
In the limit t → ∞ the integral in Eq. (4) becomes the Fourier transform F[f ](ω = 0) of f (t), evaluated at the two-level transition frequency. This proves analytically that -irrespective of the driving strength, no excited state population exists after the pulse unless there is overlap between the dichromatic excitation spectrum and the fundamental transition of the quantum emitter.
To illustrate this, Figure 1(a) presents the dynamics of an ideal 2LS under dichromatic excitation with Gaussian pulses. This shows transient excited state population which, however, vanishes again towards the end of the pulse, i.e. the overall accumulated pulse area does indeed cancel almost completely. The small but finite residual occupation can be explained by the nonzero overlap between the tails of the Gaussians and the fundamental transition. Consequently, coherent Rabi-like oscillations with unity population inversion can still be obtained, albeit at much larger intensities, as depicted in Figure 1(b). However, this obviously defeats the purpose of employing the dichromatic excitation scheme.
The transient excited state occupation is key to understanding how significant population inversion can still be obtained even if the exciting pulse has no spectral overlap with the transition of the emitter: Any additional interaction or dissipation can interfere with the complete cancellation of the pulse area, and thus lead to a finite population inversion after the pulse. For example, in a laser-driven QD, the interaction with phonons induces incoherent thermalization dynamics in the instantaneous laser-dressed state basis, unlocking excited state population up to ∼ 50%.
In this Letter, we propose and experimentally demonstrate an alternative, externally controllable approach to dichromatic pulsed excitation (DPE). To obtain large stationary occupations of the excited state we employ asymmetric dichromatic excitation with red and bluedetuned pulses with different intensities. For B (t) = R (t), the Bloch sphere precession axis Ω(t) then has a finite time-dependent y-component, adding more degrees of freedom to the coherent dynamics. For suitable parameter choices, such asymmetric DPE can result in Bloch sphere trajectories that coherently evolve towards the excited state at long times.
We experimentally verify our insights by characterizing the dynamics and quality of the scattered photons under DPE of a solid-state 2LS. As discussed in detail in the following, our results confirm a maximal population transfer fidelity of approximately 50% under symmetric DPE, owing to incoherent phonon-induced dynamics. Further, we show that coherent dynamics with a population inversion of 80% are achievable through an asymmetric weighting of the red and blue components of the dichromatic pulse. We conclude our study by analyzing the quality of the resulting photons in terms of the degree of multi-photon suppression and Hong-Ou-Mandel (HOM) visibility.
As a solid-state 2LS for the dichromatic excitation experiments, we use the negatively-charged exciton transition, X 1− , of a charge-tunable, planar cavity InGaAs QD sample [18,23]. Figure 2(a) shows the experimental setup to generate the dichromatic pulses for excitation. A mode-locked laser with 80.3 MHz repetition rate and 160 fs pulse width is sent to a folded 4f setup, which consists of lenses, beam expanders (BE), a grating, a set of two motorized razor blades (RB) and a beam block. The RB control the overall spectral width of the diffracted beam while a beam block placed between them removes the undesired frequency component resonant with the zero-phonon line, simultaneously ensuring phase-locking. After back reflection on a mirror, the remaining light recombines on the same grating and gets coupled into an optical fibre before exciting the QD. Figure 2(b) depicts an example of the spectra of the excitation laser and the absorption profile of the QD, detuned from the zero-phonon-line (ZPL) at ω 0 = 1.280 eV (968.8 nm), measured using a spectrometer with ∼ 30 µeV resolution. The spectrum of the QD shows an atomic-like zero-phonon line (ZPL), along with a broad, asymmetric phonon sideband (PSB) arising primarily from interaction with longitudinal acoustic (LA) phonons [23,24]. The excitation laser spectrum shows the spectral width and the separation of the red and blue sideband of 0.5 meV and 1.2 meV, respectively. We define the pulse contrast C of the dichromatic pulse as a function of the integrated intensity of the red (I R ) and blue (I B ) sideband, as C = (I B −I R )/(I B +I R ). Finally, after filtering on the ZPL, the scattered photons are detected on a superconducting nanowire single photon detector with ∼ 90% detection efficiency at ∼ 950 nm. We first compare the experiment results for symmetric dichromatic driving (C = 0), blue-detuned excitation (C = 1), and red-detuned excitation (C = −1) with that obtained via pulsed resonant fluorescence (RF). These results are depicted in Figure 2(c). While we observe the expected Rabi oscillation under RF, we record much higher emission intensities at C = 1 than at C = −1, consistent with findings from previous studies, and corresponding to phonon-assisted excitation [25,26]. Contrary to the expected minimal state occupation for an ideal 2LS under symmetric dichromatic excitation at C = 0 (c.f. Figure 1), we observe a population inversion fidelity of ≈ 50% at a pulse area of ∼ 20 π. We attribute this to the unavoidable electron-phonon interaction: as discussed, phonon-induced thermalization allows occupations of 50%, compared to only vanishingly small levels for a dissipation-less 2LS. We now proceed to characterize the dynamics of the system under asymmetric DPE. To achieve this, an additional beam block mounted on a motorized translation stage is added in front of the RB to allow independent control of the width of the red or blue sideband. The excitation pulse is split via a 99/1 fibre beam splitter, with the low power channel sent to the spectrometer to estimate the pulse contrast and the higher power channel used to excite the QD. Figure 2(d) shows the experimental measured emission count rate as a function of pulse contrast and excitation power. We compare the experimental data with simulations using a numerically exact real-time path-integral formalism [27] with parameters typical of GaAs QDs [28] and employing a pair of rectangular driving pulses. We refer to Section I in the Supplementary Materials [29] (SM) for full simulation parameters. The simulation, taking into account of the excitonphonon coupling in Figure 3(a) shows close qualitative agreement with the experimental data. For comparison, the dynamics obtained in the absence of exciton-phonon coupling is depicted in Figure 3 To better illustrate the coherent dynamics of the asymmetric pulses, in Figure 3(d) we present simulated 2LS population dynamics on the Bloch sphere for a pulse area up to the first coherent oscillation in Figure 3(c) at C = −0.65. In the absence of dissipation (top), the state of the 2LS remains pure and is constrained to the surface of the Bloch sphere. The nontrivial spiralling trajectory is a consequence of the time-dependent x and y components of the effective electric field associated with the asymmetric pulse. In this particular instance, the trajectory evolves towards the excited state located at the north pole. When the interaction with phonons is accounted for (bottom), the system features mixed-state dynamics that are no longer restricted to the surface of the Bloch sphere. Qualitatively, the spiralling trajectory still looks similar to that of the phonon-free case. However, now the excited state is no longer reached. Rather, the projection onto the z-axis gives a final excited state occupation of ≈ 60%. Note that this value is lower than the measured 80% inversion fidelity, likely due to a slight mismatch in the pulse shape between simulation and experiment. In any case, our combined results indicate that for C ≈ −0.65 phonons certainly quantitatively affect the dynamics but dominating coherent oscillations nonetheless survive. In contrast, for positive pulse contrast, the higher maxima of the coherent oscillations are strongly suppressed by the interaction with phonons. This qualitative difference in behaviour between positive and negative pulse contrast is attributable to the differing spectral overlap of the dichromatic pulse pair with the QD's phonon side band (cf. Figure 2).
Richer and even more complex dynamics emerges when moving beyond the case of a 2LS. In Sections VII and VIII of the SM [29] we present a range of spectroscopic results from more complex multi-level solid-state systems, however, a full exploration of those systems under DPE is beyond the scope of the present study.
Having identified the pulse contrast and excitation power to optimize the emission count rate, we proceed to characterize the single photon performance from our QD under DPE. By sending the photons into a Hanbury-Brown and Twiss interferometer, we observe ⊥ ) and parallel (g (2) ) polarizations. (c) Close-up of the zero delay peak for g (2) reveals a dip, due to temporal filtering from our detectors. Dashed (green) and solid (orange) lines represent the convolved and the de-convolved fit to the experimental data (solid circles), respectively. (d) Two-photon interference visibility VHOM as a function of the integration time window for g (2) (⊥) around τ = 0. The solid (dashed) line is obtained from integrating the convolved (deconvolved) fit in (c). multi-photon suppression of g (2) (0) = 0.016 (1), indicating near perfect single photon emission, as shown in Figure 4(a). We then measure the indistinguishability of the scattered photons via HOM interference between two consecutively emitted photons at a time delay of 12.5 ns. The figure of merit here is the two-photon interference visibility V HOM , determined by sending the photons into an unbalanced Mach-Zehnder interferometer with an interferometric delay of 12.5 ns to temporally match the arrival time of subsequently emitted photons on the beam splitter. Figure 4(b) and (c) show the normalized HOM histogram as a function of time delay τ between detection events for photons prepared in cross (g (2) ⊥ ) and parallel (g (2) ) polarizations within a 60 ns window and a 6 ns wide zoom into the central peak, respectively. This close-up on the co-polarized g (2) peak near the zero-delay illustrates the characteristic dip. We fit the experimental data with the function [30][31][32][33], convolved with a Gaussian instrument response function (bandwidth of 0.168 ns), where the independently measured lifetime is T 1 = 687 (3) ps. This yields a de-convolved visibility of V deconv. HOM = 0.95 (1) and a 1/e width of τ C = 0.33 (2) ns. The signature dip around the zero delay, usually present under non-resonant pumping and resonant two-photon excitation schemes, indicates deviation from the transform limit and thus imperfect photon wave packet coherence. The width of the dip corresponds to the characteristic time of T * 2 = 2τ C = 0.66 (4) ns for the inhomogeneous broadening of the emitter due to pure dephasing or timing jitter in emission [30,34]. We speculate that this may be dominated by phonon-induced dephasing, as we observe a narrower dip under phononassisted excitation while noting its absence under strict monochromatic resonant excitation. See Section III and IV in SM [29] for the corresponding experimental evidence and discussion. Figure 4(d) shows V HOM as a function of integration window around τ = 0 for temporal filtering of events between detection. Temporal post selection [31] increases the raw visibility, V HOM from 0.29 (2) to 0.81 (12) when narrowing the integration time window from 10 ns to 0.1 ns, respectively. Integrating the fit function to g (2) (⊥) (solid lines) gives a maximum convolved (de-convolved) visibility of V HOM = 0.81 (0.95). The presence of residue coincidences around the zero delay in the histogram for scattered photons under DPE indicates the effect of finite time jitter and dephasing in the photon coherence, rendering the scheme partially coherent.
In summary, we have shown that, counter-intuitively, symmetric dichromatic excitation is unsuitable for achieving coherent population control of quantum emitters. Specifically, it suffers from excitation inefficacy due to cancellation of the accumulated pulse area, and the inversion efficiency scales with the spectral overlap of the driving pulses with the emitter resonance. This nullifies the purported advantage of separating the spectrum of the driving field from the emitter zero-phonon line for background-free photon extraction. Recognizing this problem, we demonstrate that a simple adjustment in the relative weighting of the red and blue-detuned pulses is sufficient to improve the population inversion efficiency whilst maintaining minimal spectral overlap. Unity population inversion is then possible for an ideal 2LS, and we have measured 80% inversion efficiency with our QD sample. The presence of intensity oscillations under asymmetric driving demonstrate the coherent nature of the observed dynamics, yet those dynamics deviate from canonical Rabi oscillations and intrinsically feature non-trivial and complex Bloch-sphere trajectories. Our work has further experimentally demonstrated near perfect multi-photon suppression and high levels of photon indistinguishability (via temporal filtering) for such an asymmetric dichromatic excitation approach. This provides a new route to coherently excite quantum emitters, opening the prospect of background-free single photon extraction with suitably optimized cavity-coupled pho-tonic solid-state devices [35][36][37].
where |g and |e are the ground and excited states of the quantum dot (QD), respectively, b † q is the creation operator of a phonon in mode q, ω q is the energy of mode q, and γ q describes the strength of the coupling between phonon mode q and the exiced QD state.
Using a real-time path integral method [1], the dynamics induced by the total Hamiltonian H tot is solved numerically exactly, i.e., without any approximation other than a finite time discretization. The influence of the phonons, which are assumed to be initially in thermal equilibrium at temperature T = 4 K, is uniquely determined by the phonon spectral density J(ω). For deformation potential coupling to longitudinal acoustic phonons We use standard parameters [2] for a GaAs-based QD with electron radius a e = 3.0 nm, hole radius a h = a e /1.15, speed of sound c s = 5110 m/s, density ρ = 5370 kg/m 3 and electron and hole deformation potential constants D e = 7.0 eV and D h = −3.5 eV, respectively.
To approximate the pulses used in the experiment, we assume a rectangular shape in the frequency domain. The red and blue-detuned pulses each has a full-widthat-half-maximum (FWHM) of Γ = 0.4 meV and is detuned by ∆ = 0.6 meV from the transition energy of the two-level system. As in the experiment, different overall intensities for red and blue-detuned pulses are implemented via different spectral widths of the rectangles W R and W B while the heights of the rectangles are chosen to be the same. In the time domain these pulses take the form where t 0 is the time corresponding to the center of the pulse and A is the pulse area for a single resonant pulse in absence of QD-phonon interactions. Figure S1(a) shows the emission spectrum of the QD under resonant continuous wave (CW) excitation, showing a narrow zero-phonon line (ZPL, shaded) and a broad phonon-sideband (PSB), originated from relaxation from the phonon-dressed states. The fit (orange solid line) is obtained from the polaron model using previously cited parameters [3], which gives a ZPL fraction of ≈ 92%. The schematic of the energy level of the X 1− transition in zero magnetic field (inset of Figure S1A)  and |↓ ⇔ |↑↓, ⇓ . Here, the single (double) arrows refer to electron (heavy-hole) spin state. Each transition has a well-defined optical selection rule such that it can be optically coupled with right (σ + ) or left (σ − ) circular polarized light. Keeping the frequency of the excitation laser fixed at ω 0 = 1.280 eV, we scan through the resonance of the QD via d.c. Stark tuning to measure the linewidth of the scattered QD photons. A Lorentzian fit to the detuning spectra of the QD in Figure S1(b) under weak excitation gives a full-width-at-half-maximum (FWHM) of Γ = 2.43 (6) µeV. Figure S1(c) shows the time-resolved lifetime measurement of the emission under pulsed monochromatic resonant excitation at πpulse. A single-sided exponential decay fit to the data (convolved with the instrument response function with FWHM of 160 ps) reveals an excited state lifetime of T 1 = 0.687 (3) ns. This corresponds to a transformlimited linewidth of ∼ 1 µeV. The deviation of measured linewidth Γ from the transform-limited linewidth indicates the existence of pure-dephasing from the solidstate environment, possibly originating from charge and spin noise of the QD device [4][5][6].

III. PULSED MONOCHROMATIC RESONANCE FLUORESCENCE (RF)
To benchmark the performance of the dichromatic pulse excitation (DPE) scheme, we perform pulsed monochromatic resonance excitation (resonance fluorescence, RF) on the same transition (and the same QD). We optically excite the QD using a ≈ 14 ps-width pulse (spectral bandwidth of ≈ 80 µeV), and filter out the QD signal via polarization and spectral filtering to suppress the excitation laser spectrum. Figure S2(a) shows the normalized intensity of the emission as a function of the square root of the excitation power. We fit the data using the time-dependent excited state population function, derived from the pure dephasing model [7,8], showing coherent Rabi oscillation as a function of pulse area. Fixing the excitation power to a π-pulse, we perform intensity correlation and Hong-Ou Mandel (HOM)-type two-photon interference measurements on the scattered photons. Due to the imperfect excitation laser rejection (signal-to-background of ∼ 20), we obtain a multi-photon suppression of g (2) (0) = 0.080 (2) for the scattered photons, as shown in Figure S2(b). In Figure S2(c, d), we observe a post-selected HOM visibility V HOM of 0.84 (15) and 0.49 (3) at 100 ps and 10 ns integration windows, respectively. The HOM visibility is computed as the ratio of two-photon interference of consecutive photons, prepared in parallel, g (2) and in perpendicular polarization, g ⊥ , which follows as ⊥ .
With monochromatic resonant excitation, despite the higher g (2) (0) due to the imperfect suppression of the excitation laser, we observe a higher two-photon interference visibility (V HOM = 0.58 (3) (1), see Figure S3 in Section IV). This implies, while the DPE scheme benefits from the fact that polarization filtering is not needed for background-free single photon collection, the RF excitation technique is still preferred as a means to generate single photons with higher indistinguishability. Figure S3 demonstrates the performance of the QD (for the same transition) under phonon-assisted excitation. The excitation laser pulse has a pulse width of 7 ps and is detuned ≈ 0.8 meV from the ZPL. The excitation pulse area is ≈ 20 π, corresponding to saturation count rate. The scattered photons are then spectrally filtered with the same 120 µeV-bandwidth grating filter to suppress the scattered laser. Despite large multi-photon suppression, giving g (2) (0) = 0.025 (1), due to the emission timing jitter that arises from the absorption of phonons assisting the population of the excited state, we observe a HOM visibility is slightly lower than that for the DPE and RF schemes, giving a post-selected HOM visibility of V HOM = 0.64 (14) and V HOM = 0.19 (1) at 100 ps and 10 ns integration windows, respectively. In addition, we observe a narrower (and shallower) dip (giving a 1/e width of 158 ps) around the zero-delay in g (2) , as indicated in Figure S3 (1), suggesting that dephasing due to the phonon-bath operates at a time scale way shorter than the bandwidth of our detection instrument response function (FWHM= 160 ps), as predicted in Ref. [9].

V. TWO-PHOTON INTERFERENCE VISIBILITY: COMPARISON BETWEEN RF AND DPE SCHEMES
In this section, we report on the results of on the HOM visibility as a function of temporal delay between the arrival time of the two input photons on the beam splitter, δ. We render both input paths of the beam splitter indistinguishable in polarization (g (2) ), and measure the detection time delay τ between "click" events on the photon detectors for each δ. We perform this measurement on the same transition and QD under both RF and DPE schemes. Figure S4 shows the comparison of the HOM visibilities between the RF (a-c) and The fit (solid line) to the experimental data (circles) is derived from the pure dephasing model [7,8]. (b) Intensity-correlation histogram of the scattered photons at π-pulse shows a multi-photon suppression of g (2) (0) = 0.080 (2). (c) Two-photon interference histogram of the consecutively emitted QD photons at π-pulse, prepared in parallel (g (2) ) and perpendicular (g DPE (d-f) schemes. Figure S4(a) and (d) shows the normalized coincidence around the zero delay, g (2) (0), integrated over a window of 10 ns as a function of δ. When the two input photons perfectly overlap with each other on the beam splitter (δ = 0), we observe a minimum in g (2) (0). The data is fitted with a simple exponential function (g (2) (τ = 0, δ) = 0.5 × (1 − V exp(−|δ|/T 2 ))) to extract the coherence time T 2 and the visibility of the HOM dip, V, of the scattered photons. We obtain a T 2 = 0.457 (27) ns (T 2 = 0.548 (13) ns) and V = 0.49 (2) (V = 0.35 (1)) for scattered photons under RF (DPE) excitation. The lower HOM visibility V for the DPE scheme, despite much higher signal-to-background ratio, is due to presence of the dip in the detection time histogram. This is evident in Figure S4(b, c, e, f). Figure S4(b) and (e) show the 2D plot of the normalized coincidence as a function of both δ and τ . We observe a similar pattern reported in Ref. [10,11], in which the presence non-vanishing dip around the zero detection time delay τ = 0 is due to either the timing jitter or pure dephasing mechanism. Figure S4(c) and (f) show the coincidence histogram of g (2) (τ, δ) for δ =-0.9, -0.5, 0 and 0.5 ns. The appearance of the dip even at perfect overlap (δ = 0) in the DPE case with negligible background, is a signature of pure dephasing/timing jitter in the emission, which originates from phonon-induced dephasing [9,12]. In Section IV, we observe a similar signature (narrow dip around the zero time delay τ ) in g (2) (τ, δ = 0), which further confirms our claim that the HOM visibility suffers from the same phonon-induced dephasing mechanism in the DPE scheme. For an emission that is dephasing and jitter free, we expect the disappearance of the dip around τ = 0 [13]. We attribute the disappearance of the dip for the RF case as a signature of jitter-or dephasing-free performance, and the nonvanishing coincidence g (2) (τ = 0, δ = 0) is solely due to the imperfect filtering of the background laser scattering in the collection. With proper filtering to improve the signal-to-background ratio (ideally > 100), we should be able to minimize these coincidences, giving close to unity indistinguishability [14].

VI. DICHROMATIC PULSES WITH DIFFERENT PULSE PARAMETERS
This section explores the population inversion efficiency of a solid-state two-level system for different pulse parameters under DPE. Here, we address the negativelycharged exciton (X 1− ) transition of a different QD. The two pulse parameters: pulse width, ∆ω, and pulse detuning, ∆, are used to characterize the pulse shapes. They are defined as the spectral width of the red/blue-detuned pulse and the detuning between red and blue-detuned pulses, respectively. To reduce experimental complexity, we vary the pulse width and detuning symmetri- cally, keeping the red and blue-detuned components of the dichromatic pulses the same throughout. Figure S5 shows the emission spectra and the detected count rates from pulsed RF and DPE at various pulse parameters. Here, we vary the thickness of the beam block and the separation of the razor blades in the pulse strecher (see Figure. 2(a) in the main text) to remove the particular spectral components in the original 160 fs (corresponds to spectral bandwidth of ∼ 11 meV) laser pulse. The excitation laser spectra used for dichromatic excitation is illustrated in Figure S5(a), along with the resonantly driven emission spectrum from the X 1− transition under pulsed RF at π-pulse, in order to highlight the spectral overlap between the laser pulses and the broad phonon-sideband. Figure S5(b) shows the emission count rate as a function of square root of the average excitation power for various DPE pulse parameters. The pulse area is normalized to the optical power needed for a π-pulse under pulsed monochromatic RF (pulse width of ∼ 35 ps and bandwidth of ∆ω = 0.05 meV). Under pulsed RF, we observe Rabi oscillation in emission intensity as excitation power increases beyond the π-pulse. The fit (solid blue line) to the experimental data (RF, blue circles) is derived from the same model used for the fit in Figure S2(a). Unlike the monochromatic RF case, we observe a sigmoid-like saturation curve in the emission intensity. We observe reduction of saturation intensities, below 0.5 times the intensity at π-pulse under RF, and an increase in the excitation power needed to reach saturation as the pulse detuning ∆ increases beyond 1.5 meV.
The observed reduction in the saturation intensity with ∆ is consistent with the reduction in population inversion efficiency under monochromatic phonon-assisted excitation at large detuning [15], confirming the impact of phonon-mediated preparation and excitation-induced dephasing [12,16]. The anomaly at ∆ = 4.4 meV can be attributed to either dominant phonon-assisted driving than the dichromatic driving or experimental imperfection in the excitation. For instance, any imbalanced in components of the red and blue-detuned pulses, slight detuning of the dichromatic pulses from the ZPL of the QD emission and chirping in femtosecond pulses introduced by the dispersion of the fibre would deviate from the theoretical behaviour. Nevertheless, the experimental evidence shows that the dynamics of the emission is sensitive to the excitation dichromatic pulse details (frequency detuning, pulse contrasts, pulse widths and pulse detuning). Hence, extra care has to be taken when selecting pulse parameters for the dichromatic pulses, ideally avoiding femtosecond pulses (with pulse detuning ∆ 2 meV) if the excitation pulses are made to propagate in optical fibres to minimize any possible pulse chirping effects.

VII. DICHROMATIC EXCITATION ON THE NEUTRAL EXCITON
In this section, we perform DPE on the neutral exciton X transition. Here, we consider balanced dichromatic pulses, with (dichromatic) pulse contrast C = (I B − I R )/(I B + I R ) ≈ 0, where I R and I B are the integrated intensity of the red and blue-detuned (from the resonance of the X transition) component of the dichromatic pulses, respectively. Figure S6(a) shows energy level schematic of the four-level biexciton-exciton (XX-X) cascade system. Upon excitation into the biexciton state |XX , a cascaded radiative decay from biexciton state |XX to the vacuum ground state |0 is initiated via either of the intermediate neutral exciton states |X H(V) . This generates a pair of polarization-entangled, orthogonally polarized photon pairs consisting of emission from both the biexciton-exciton (XX) and the exciton-vacuum (X) states transitions, distinguished via polarization (in the horizontal (H) or vertical (V) linear polarized basis) and difference in emission energy equal to the biexciton binding energy, E B . Figure S6(b) illustrates the laser spectrum for both the DPE (at pulse detuning of ∆ = 1.2 and 2.5 meV) and the resonant two-photon excitation (TPE). The two-photon resonance lies at the half the energy difference between the X and XX transition, as indicated by the dashed lines, which gives a biexciton binding energy of E B = 1.95 (1) meV. The exciton fine structure splitting, independently measured via time resolved lifetime measurement, gives δ = 19.6 (1) µeV. We resolve one of the exciton fine structures |X H(V) by adjusting the linear polarizer in the collection to the polarization axis of the desired transition, while keeping maximal suppression in the excitation laser by calibrat-ing the orientation of the linear polarizer in the excitation accordingly. We spectrally filter either transitions before detecting the photons on a SNSPD. Figure S6(c) shows the emissions from the two transitions, observed simultaneously under TPE, as a function of the excitation power. As demonstrated in previous literature [17] , we observe Rabi oscillation in both X and XX emissions, which enables coherent manipulation of the state occupation of the excitonic states. Surprisingly, when employing dichromatic driving on the same transitions with red-detuned pulses overlapping with the two-photon resonance (∆ = 2.5 meV), we observe similar Rabi oscillation, shown as purple circles in Figure S6(d).
Here, we speculate that unlike the solid-state two-level system (negatively-charged exciton, X 1− ), the contribution from two-photon resonance driving (red-detuned) to the state population inversion dominates over the phonon-assisted driving (blue-detuned). We validate this by showing that the Rabi oscillation observed at C = 0, is similar to that observed under TPE when we drive the X transition solely with the red-detuned pulses (C = −1). Additionally, we observe lower emission intensities when excitation laser only consists of the blue-detuned component (C = 1) of the dichromatic pulses. These evidences confirm our hypothesis, indicating a deviation from the expected outcome from the solid-state two-level system (c.f. Figure 2 and 3 in the main text) when dealing with multi-level system.
As we decrease the dichromatic pulse detuning to ∆ = 1.2 meV such that there is minimal overlap between red-detuned pulses with the two-photon resonance, we observe the disappearance of the Rabi oscillation when both red and blue-detuned component of the dichromatic pulses are present. In addition, we observe a higher emission intensity when it is driven with the red-detuned pulses, compared to the lower detected count rate under phonon-assisted driving using blue-detuned pulses. These results are illustrated in Figure S6(e). It is interesting to note that for the blue-detuned driving (C = 1), while still having lower the emission intensity as the reddetuned driving (C = −1), it shows sign of saturating at excitation power beyond 25 µW. This indicates that even when there is no overlap between the excitation pulses and the two-photon resonance, for a biexciton-exciton cascade system, the contribution from two-photon resonant driving (C = −1) dominants over the phononassisted driving (C = 1) in affecting the state dynamics. This adds further complexity in exploiting the DPE technique to coherently drive of the neutral exciton, X transition. Further modeling would be beneficial to understand the physics behind this phenomena and to potentially utilize it as a tool for coherent single photon generation for multi-level atom-like system.
The architecture for the two samples, labeled as sample A and B, are illustrated in Figure S7(a) and (d), respectively. While both of them have the same heterostructure, which consists of 1L-WSe 2 encapsulated by few layers of hexagonal boron nitride (h-BN), their sample structures differs in the planar cavity design. For sample A, the heterostructure is placed on top of a 140 nm stopband flat distributed Bragg reflector (DBR) centred at ≈ 710 nm (1.7463 eV) with a 6 nm thick bottom h-BN flake acting as a spacer, forming a λ/4 planar cavity at λ = 780 nm (1.5895 eV). In contrast, for sample B, the heterostructure is placed on top of a gold mirror with a bottom hBN flake of 59.3 nm, creating a λ/4 planar cavity at λ = 780 nm (1.58954 eV). The photoluminescence emission from SPEs in both samples (grey, shaded), excited using the same non-resonant continuous wave source at 532 nm (2.33 eV), are shown in Figure S7(b) and (e). Their emission profile are detuned from the zero-phonon line (ZPL, green, shaded) at 1.6025 and 1.5946 eV, respectively. Upon a close inspection of the emission spectra in Figure S7(b), we observe emission peaks, which correspond to the ZPL from multiple emitters. The two peaks (ZPL and the peak beside it) in Figure S7(e) belongs to the exciton fine structures of the same transition. The dichromatic laser spectra are displayed alongside the SPEs emission spectra, with the red and blue-detuned (from the ZPL) laser components given by the red and blue shaded region, respectively. The broad phonon-sideband (PSB, orange, shaded), detuned ≈ −0. and PSB, we obtain a ZPL fraction of ≈ 65 % for both samples, typical for SPE in these materials at cryogenic temperature. Subsequently, we filter out the ZPL using a grating-based spectral filter (FWHM= 0.296 (1) meV) to suppress the laser sideband before it is detected on a spectrometer. The emission intensity of the ZPL, as a function of excitation power for both samples, are shown in Figure S7(c) and (f), respectively. While the pulse parameters for the two dichromatic excitation differ (e.g. the dichromatic pulse detuning, ∆ for the excitation on sample A and B are ∆ = 2.0 and 6.7 meV, respectively), we observe some form of oscillations. Figure S7(g) and (h) demonstrate suppressed multi-photon emission prob-ability, g (2) (0) ∼ 0, from the spectrally filtered ZPL signal in Sample A, measured using a fibre-based Hanbury-Brown and Twiss interferometer, under continuous wave and pulsed non-resonant 750 nm excitation, respectively. These results confirm the nature of single photon emission from these emitters.
While an accurate interpretation of the experimental data is currently unavailable due to the lack of clear quantum optical picture for these emitters, these results demonstrate coherent population driving of SPEs in 1L-WSe 2 under DPE, as an alternative to monochromatic resonant excitation [23,24].