Purifying electron spectra from noisy pulses with machine learning using synthetic Hamilton matrices

Photo-electron spectra obtained with intense pulses generated by free-electron lasers through self-ampliﬁed spontaneous emission are intrinsically noisy and vary from shot to shot. We extract the puriﬁed spectrum, corresponding to a Fourier-limited pulse, with the help of a deep neural network. It is trained on a huge number of spectra, each of them calculated by an extremely efﬁcient propagation of the Schrödinger equation with synthetic Hamilton matrices and random realizations of ﬂuctuating pulses. Since this training input does not explicitly address the dimensionality of the electron dynamics, the trained network can purify spectra for realistic 3D dynamics. We demonstrate our approach with resonant two-photon ionization, a non-linear process which is particularly sensitive to pulse ﬂuctuations.

Our goal is to train a deep neural network with sufficiently many noisy spectra and their pure counterpart, such that the trained network will be able to purify a "new" spectrum which is not contained in the training data and which could be an experimental one.With purification, we mean that upon feeding with a noisy spectrum the network returns a reference spectrum that would be obtained if the target system would be illuminated by an ideal Gaussian laser pulse, which we call the reference pulse, cf.Fig. 1.This may seem straightforward.Yet, it is anything but trivial to generate a sufficient amount of suitable training data with an acceptable effort.Since that is in general the bottleneck for machine-learning applications in theory it requires new ways of thinking.In this vein, we introduce synthetic Hamilton matrices (SHMs) in order to speed up the generation of training data.The use of SHMs has another advantage: We have the freedom to vary the matrix elements (here in a random fashion) about base values obtained from explicitly solving 1D electron dynamics.If the variation is widespread enough, the resulting SHM should not be limited to 1D dynamics which is in fact the case, as we will see.
Setting up networks with SHMs.To put our approach to a credible test we need (i) a physical process, which is sensitive to the pulse profile, (ii) a realistic way to model fluctuating pulses and we need to prepare a large set of spectra suitable for training the network.This involves (iii) a scheme to efficiently propagate on the order of 10 7 time-dependent Schrödinger equations, (iv) a homogeneous sampling of the resulting spectra and (v) a trainable parametrization.
(i) As a physical process which is non-linear in the light coupling and therefore very sensitive to the intensity of the light pulse and hence its profile in time we have chosen (quasi-)resonant two-photon ionization.It can lead to multi-peak structures in the photo-electron spectrum [10][11][12] due to an Autler-Townes splitting [13].In the timedomain, the multi-peak structure can be understood in terms of a dynamic interference [12,14] of electron wavepackets released at various time instances, determined by dynamic Stark shifts which follow the pulse envelope in time.
(ii) Fluctuating pulses and in particular those from SASE FELs can be modeled by the so-called partial-coherence method [15], which allows one to create ensembles of pulses f l (t) with fluctuations that differ from shot to shot but give on average a well-defined spectral representation.In the time domain those pulses have a characteristic duration T and a coherence time τ.We use T = 3 fs and τ = 1/2 fs for this study.Apart from the intrinsic noise the pulses additionally jitter in their pulse energy.We normalize all pulses f l (t) to unit pulse energy.This is also possible for experimental pulses since pulse energies can be easily measured shot-to-shot with gas monitor detectors [16] (in contrast to the time profile of the pulse).For technical details of the pulse creation see the supplemental material [17].
(iii) The propagation of the time-dependent Schrödinger equation (TDSE) for one active electron in a strong laser field and the calculation of the resulting photo-electron spectrum P(E) is by now standard and numerical codes are available [18][19][20].Moreover, we are interested in short wavelengths, which reduces the computational effort due to the lower number of photons involved.Nevertheless, the creation of a training data set from millions of pulses is prohibitively expensive, yet essential for successful deeplearning.
To overcome this obstacle we work in a representation of the dynamics with Hamilton matrices whose construction follows standard procedure and is described in the supplement [17].The new element, particularly formulated for the present context is that we create the necessary sample size of training systems by generating n mat Hamilton matrices with random energies E k α , coupling matrix elements V k αβ from field-free dynamics, and field strengths A k , corresponding to intensities (referring to the Fourier-limited pulse) in the range of 5×10 15 . . .5×10 16 W/cm 2 .Furthermore, the coupling to the light is augmented by n pul noise realizations f l (t) to arrive at whereby k = 1 . . .n mat and l = 1 . . .n pul .Boldface symbols in Eqs.
(1) describe matrices in terms of field-free states.The matrices have been derived from a 1D Hamilton operator, but since the energies E k and the coupling matrix elements V k are chosen randomly, these SHMs can describe dynamics not restricted to 1D, as we will see subsequently.
(iv) Eventually, we have to create a preferably homogenous set of spectra that can be used for training, validating and testing the network.This step is crucial and most expensive numerically, particularly when compared to the (modest) resources needed to set up and train the network.To obtain a homogenous set of spectra we calculate first 4×10 4 reference spectra [21].Among those we select the n mat = 10 4 spectra with the largest mutual difference For each member of this subset of reference spectra, we calculate n pul = 10 3 fluctuating spectra from noisy pulses generated with the partial-coherence method [15] mentioned above.We use a different noise realization for each (synthetic) Hamilton matrix.Note that this procedure amounts to the propagation of about 10 7 TDSEs, where a single TDSE takes only about a few seconds thanks to a highlyoptimized propagation scheme [17], which includes prediagonalization of the Hamilton matrices.The latter is useful since one and the same system is propagated for different pulse realizations.
Finally, we obtain for each Hamilton matrix (1) one reference spectrum and n pul fluctuating spectra, i. e., in total n mat ×[n pul +1] spectra P kl (E).For efficient training we use averages over m = 200 noisy photo-electron spectra P k j (E) = 1 m l∈ j P kl (E) instead of the individual fluctuating spectra P kl (E).For our application m = 200 pulses is a good compromise between rugged spectra for smaller m and an increasing numerical effort for larger m.We calculate n pul = 10 averages by randomly picking m spectra from the n pul available ones.
(v) To complete the final step, the parametrization of the spectra for training, we represent the resulting averaged spectra P k j (E) in terms of harmonic oscillator eigenfunctions χ κ as with the set C ≡ {C 1 . . .C n bas } of coefficients.A basis size of n bas = 60 was necessary for the averaged fluctuating spectra, while using a similar expression for the noise-free spectra n bas = 40 was sufficient [17].The network consists of mapping the coefficients {C k j } → {C k j }.The training aims at minimizing the difference between the predicted C k j for the noise-free spectrum and the expected reference spectra Mathematically, this corresponds to minimizing a cost function, which is given below in Eq. (4b) with Ω = train.The connection of Hamilton matrices, pulses and spectra is summarized schematically in Fig. 2.
Building and training the network.With n mat = 10,000 reference spectra and n pul = 10 averaged noisy "copies" of each reference spectrum, we have n ≡ n mat ×n pul = 10 5 pairs available for building the network model.Each pair consists of an averaged noisy spectrum with its respective reference spectrum.
The full data set with n pairs is split into training (80 %), validation (10 %) and test (10 %) data, respectively.Implemented with the deep-learning library KERAS [22], a fully connected feed-forward neural network is used to establish the mapping, cf.Fig. 2. In order to be self-contained, we summarize technical details about the training, i. e. the optimization of the network parameters, in the supplemental material [17].The training success and resulting performance of the network as a function of the size of the training data is quantified with two error functions with Ω indicating the set (with size n Ω ) over which the sum is carried out.The errors val and δ val (here Ω = val refers to the validation data set) are shown in Fig. 3 as a function of the SHM data size.The error , as defined in Eq. ( 2), gives an apparent measure of the "distance" between two spectra P k j (E) and P ref k (E), with an upper limit ≤ 2. The squared error δ train is used for the cost function in the network training.Both errors decay logarithmically with the data size.
Purified spectra from SHMs.We are finally in a position to purify noisy spectra, which are illustrated with typical snapshots in Fig. 4d.To get a realistic picture we have selected three spectra, cf.Fig. 4 (a-c), which have been purified with different residual errors in increasing order: Only 1% of the spectra have a purification error better than the one shown in Fig. 4a, the prediction in Fig. 4b has a median error = 50% such that half of the spectra have a smaller and half of them have a larger prediction error.Finally, only 1% of the purified spectra have a larger error than the one shown Fig. 4c.The gray-shaded curves provide the reference spectrum P ref k (E) in all three cases of Fig. 4. The simple average over all single shots is shown as a dashed line.All spectra are normalized, i. e., dE P(E) = 1.
Overall, one sees that the purification works quite well, even for a typical "worst case" as in Fig. 4c, where all peaks, even the fine structure, appear at the correct energies, despite the fact that none of the features is contained in the averaged spectra.Indeed, the complete failure of the averaged spectra P k (E) = j P k j (E) to reveal the reference spectrum is striking.We also note that spectra of a rather different shape and details of the structure can be successfully purified, from a smooth single peak (Fig. 4a) over a double peak (Fig. 4b) to a fine-structured multi-peak shape (Fig. 4c).Those structures arise, as mentioned above, due to Stark shifts or an Autler-Townes splitting.Due to the strong sensitivity to intensity the spectra P kl (E) from single fluctuating pulses f l (t) are rather diverse (as can be seen in Fig. 4d).Note, that for the case, where fluctuating pulses are created by the partial-coherence method [15], the spectral representation of the reference pulse is given by the averaging over those from the fluctuating pulses.However, the corresponding reference spectrum is never obtained by averaging the fluctuating spectra, if the underlying lightmatter coupling is non-linear.This leads to a rather intri- cate mapping between individual fluctuating spectra and the reference spectrum, which is constructed with a deep neural network.
Purification for physical systems not known to the network.This is our true goal.To this end we have calculated the photo-electron spectra of helium atoms dominated by two-photon absorption within a single-active electron approximation [23].Technical details of the calculation are stated in the supplemental material [17].Since the initial state (in the single-active electron approximation) is an 1sstate, two-photon absorption leads to ionization to s-and d-states, respectively.We consider in the following only the d-manifold, as transitions to the d-manifold dominate by far (cf.Fig. 5d, where the s-component is negligible in the energy range shown).In order to create fluctuating pulses f l (t), we use exactly the same method and parameters as before, but consider different random realizations.The central pulse frequency ħ hω * = 0.77 a.u.= 20.95eV is close to the 1s-2p transition energy (21 eV) in helium rendering resonant two-photon ionization the dominant process, although this is not a requirement for successful purification as the network was trained with SHMs representing resonant, quasi-resonant, and non-resonant processes and is therefore applicable to generic systems.
Figure 5 shows that the purification for the helium atom is very good.As in the training procedure, we have created 10 averages, each one composed of 200 fluctuating spectra.The 10 resulting predictions are averaged in order to provide the final spectrum of interest.We show results for three different intensities in the range where the two-photon ionization is non-perturbative.Therefore and in accordance with the spectra from the SHMs shown in Fig. 4, the averaged spectra do not provide sensible information about the reference spectra in contrast to the mapping with the network which reveals the respective peak structure of the photo-electron spectra.
Note that the network was not trained on the 3D helium atom whose spectra were purified successfully with the network mapping in Fig. 5.The training of the network was performed with data derived from "1D" photo-ionization dynamics only, in order to keep the size of the Hamilton matrices small enough to be able to compute the 10 7 TD-SEs for a sufficient amount of training data.However, the SHMs, although generated from the 1D derived ones, represent dynamical systems sufficiently generic such that also realistic 3D spectra can be purified.This is a symbiotic effect of the formulation with the help of SHMs.
To summarize, we have devised a strategy to purify noisy photo-electron spectra, typical for SASE FELs with the help of a deep neural network.While this example was chosen on purpose to be specific, through its design our approach is far more general.Firstly, other noise models [24,25] can be used.Secondly, purification could be conditioned on any arbitrary reference pulse.Thirdly, and most importantly, the systematic introduction of synthetic Hamilton matrices permits to generate a training data set of sufficient size with reasonable computational effort and renders the trained network applicable for scenarios where it was not trained for.In the present example, we applied the network trained on synthetic dynamics to purify realistic 3D spectra.For future work, we would like to point out that noisy pulses driving non-linear processes are actually advantageous, since they allow one to obtain the target response over a wide spectral and dynamic range in a single shot, provided one has tools to analyze the resulting spectra.

FIG. 3 .
FIG.3.Prediction performance for validation data measured with the mean-square error δ val and the absolute error val , see Eqs. (4), as a function of the number of spectra contained in the complete SHM data set.

FIG. 4 .FIG. 5 .
FIG. 4. Photo-electron spectra from the SHM test-data set.The average of fluctuating spectra (green-dashed line) and prediction from the network (blue) are compared to the reference (gray and shaded).(a-c) Examples with 3 prediction errors = p are shown, with p indicating the percentage of spectra having a smaller error, i. e., 99% of all spectra from the test-data set have a smaller prediction error than the one shown in panel c.(d) Five single-shot spectra for the Hamilton matrix used in panel min transferred to C j FIG. 2. The scheme to create training, validation and test data for the network, exemplified with the first (H 1 ) and the kth (H k ) synthetic Hamilton matrix extended to H 1l and H kl , see Eqs. (1), with noisy pulses f l (t) or the reference pulse f ref (t).The noisy spectra calculated with the H kl are averaged over 200 realizations to give C k j , j = 1, . . ., 10 for each k = 1, . . ., n mat synthetic Hamilton matrix.The C k j together with the reference C ref k are used for training the network.