Preparing for high-repetition rate hard x-ray self-seeding at the European X-ray Free Electron Laser: Challenges and opportunities

A hard x-ray self-seeding (HXRSS) setup will soon be available at the European X-ray Free Electron Laser (XFEL). The availability of high repetition rate x-ray pulses poses novel challenges in the setup development, compared to the choices made at other facilities, mainly crystal heat-load and radiation-damage issues. However, high-repetition rate is expected to allow for unprecedented output characteristics. A two-chicane HXRSS setup is found to be optimal for the European XFEL. In this paper we discuss the physical choices peculiar to that facility and simulations done, which allow us to fix the parameter for the setup design.


I. INTRODUCTION
Hard x-ray self-seeding (HXRSS) based on single crystal monochromators [1,2] is an active filtering technique allowing for an increased spectral density and longitudinal coherence at hard x-ray FELs that usually exploit the selfamplified spontaneous emission (SASE) mechanism as a baseline mode of operation.
A HXRSS setup based on single crystal monochromators will soon be installed at the SASE2 FEL line of the European X-ray Free Electron Laser (XFEL). One of the main challenges related with its design is the compatibility with the high-repetition rate of the European XFEL in burst mode, which is a unique feature and a first for freeelectron laser facilities, but can easily yield to detrimental effects in a HXRSS setup, mainly due to heat load of the crystals. It should also be remembered that at highrepetition rate, electron beam halo scattering from the crystal and the subsequent interactions of the produced bremsstrahlung photons with the downstream undulator magnets can cause radiation damage to the undulators and potentially limit the minimal distance between electron beam and the crystal, and therefore the minimum electron delay, thus resulting in a reduction of the seed. This issue has been studied in [3]. In this paper we will focus our attention on heat load, and discuss how this issue can be mitigated with the introduction of a two-chicane HXRSS setup. Recent studies [4][5][6] suggest that the use of a two-chicane scheme brings no advantage over a single chicane. In contrast to those claims, in this paper we show, both theoretically and by means of simulations, that a two-chicane solution allows for an increased signalto-noise ratio, the signal being the seeded FEL pulse, and the noise being, in this case, the underlying shotnoise amplification. One of the possible exploitations of this increased ratio is to decrease the crystals heat load, while keeping the seed signal large enough for the setup to work.
Detailed simulations of the expected HXRSS performance are not only needed to study detrimental effects like the above-mentioned crystal heat load, but also for defining important design parameters like, for example, the position of the monochromators within the FEL line and the crystal geometry. After discussing, in the next section, HXRSS principles and challenges that are relevant to the European XFEL and, more in general, to high repetitionrate facilities, we will introduce (in Sec. III) the software toolkit OCELOT, which was extensively used to perform our HXRSS simulations. Simulation results will be further presented in Sec. IV, starting from the entrance of the SASE2 line of the European XFEL, and following the HXRSS setup up to the exit of the SASE2 undulator. Our results rely on start-to-end simulations 1 of the electron beam that drives the FEL process. We stress that a comparison between the simulated electron beam properties and those achieved in reality is important: first, to precisely foresee the actual performance of the HXRSS setup at the European XFEL and, second, to optimize it by tuning as much as possible the characteristics of the electron beam towards an optimum. This, however, would require an ad hoc experimental measurements campaign and is beyond the scope of this work.
The HXRSS setup designed for the European XFEL is expected to operate from below 5 keV (theoretically down to 3 keV, with probable limitations due to heat load) up to 14.4 keV (and above, if one considers the option of tuning part of the radiator to a harmonic of the fundamental [9]), with different nominal electron charges 20, 100, 250, 500, and 1 nC and working energies 8.5, 12, 14, and 17.5 GeV. Sampling the entire configuration space, aside from being a very demanding task from a computational viewpoint, is not necessary. We focused, instead, on the simulation of specific operation points as described in Sec. IV, from which we can extract information about the setup performance, and details necessary to fix the HXRSS design, under the assumption of a good agreement between nominal SASE simulations and reality.
Finally, in Sec. V, we come to discussions and conclusions.

A. Principles
The idea behind a self-seeding setup is well known [10], and will be reviewed here only briefly. Since an FEL can be used to amplify an initially given electromagnetic wave, one can think of generating an initial, longitudinally coherent signal to feed into the FEL, thus avoiding the amplification of a noisy signal (the initial electron beam density fluctuations) that is typical of the SASE mechanism. The issue is then to generate a coherent signal in the x-ray region, which is powerful enough to outclass the equivalent shot noise power level, and to superimpose it to the electron bunch at the entrance of the FEL. To this purpose, one can use a first FEL part in SASE mode and in the linear regime, filter the radiation spectrally, thus obtaining a seed signal, and finally arrange delays involved in order to superimpose electrons and radiation in a second part of the FEL undulator, see Fig. 1. A magnetic chicane is used to introduce a transverse offset of the electrons compared to the undulator axis, in such a way that a monochromator can be inserted in the x-ray path, without interfering with the electron beam. The chicane also serves as a dispersive medium, which "washes out" the electron beam microbunching accumulated in the first SASE part of the setup. In order to achieve this in a small setup, the monochromator should be designed so that the optical path delay forcefully introduced in this monochromator is small enough to be compensated by delaying the electrons with the help of a compact magnetic chicane. It was therefore proposed, and further on realized at the Linac Coherent Light Source (LCLS) to use a single diamond crystal in transmission geometry as a monochromator [1,2]. After the filter, in the frequency domain, the SASE radiation spectrum would present a dip, in correspondence to the Bragg frequency, but in the time domain the FEL pulse would be characterized by a tail of monochromatic radiation that can be used as seed, see Fig. 2 of this paper and Fig. 3 of [11]. Compared to similar setups, HXRSS at the European XFEL will cope, for the first time, with high-repetition rate x-ray pulses within a bunch train. For this reason it will be based on a twochicanes setup, see Fig. 3. Such an arrangement has been considered for the first time in [12], and heuristically based on the observation that the x-ray pulse at the position of the second crystal is almost Fourier limited. As a result, for a given input power level, the flux incident onto the second crystal in the spectral range of interest is increased by a factor about equal to the ratio between the SASE bandwidth and the seeded bandwidth. Typically, this is about an order of magnitude. This means that the signal-to-noise ratio (SNR), the signal being in this case the seed power level and the noise being the underlying SASE power level, is likewise increased by about an order of magnitude in a twochicane scheme.
The usefulness of this concept has been recently challenged in [4][5][6]. We therefore consider it of great importance to demonstrate the increase of SNR by quantitative arguments, both relying on analytical studies and on simulation. In this section we propose an analytical explanation of how the SNR increase takes place.

B. Analysis of the signal-to-noise increase
We start with a given equivalent shot-noise spectrum P n ðωÞ at the entrance of the setup. The amplification process in the FEL undulator before the first crystal happens in the linear regime, so we can define a transfer function GðωÞ for the FEL process. Moreover we consider an effective transfer function T s ðωÞ yielding the seed signal, and an effective transfer function giving the almost unperturbed pulse that is not used for seeding, T n ðωÞ. For completeness we remark that the effective transfer function T s ðωÞ also includes a temporal windowing operation, which is automatically undertaken when the delayed electron bunch is superimposed onto the seed signal, see Fig. 2 (right plot). The operation amounts to a multiplication of the electric field in the time domain by a temporal window function and amounts, therefore, to a convolution in the frequency domain, which is absorbed in the definition of T s ðωÞ. It is possible to show that this SHAN LIU et al.
PHYS. REV. ACCEL. BEAMS 22, 060704 (2019) 060704-2 approximation becomes precise in the limit for a long seeding tail compared to the bunch. These transfer functions are illustrated in Fig. 4.
The spectra just before and after the first crystal (see Fig. 5) can then be respectively written as P i1 ðωÞ and P o1 ðωÞ:  Schematics of the single-crystal HXRSS principle. After filtering, in the time domain, the FEL is characterized by a tail of monochromatic radiation. This can be used as seed by tuning the chicane delay such that the electron bunch is superimposed on it. In the right plot, this position is indicated by a black rectangle. Grey, black and red lines follow the same convention as in Fig. 1.
Note that in Eq. (1), all quantities depend on ω. However, here and in the following we understand this dependence in the equations for simplicity of notations. The first term of P o1 ðωÞ is the seed signal S o1 ðωÞ, and we will call its bandwidth σ seeded . The second term N o1 ðωÞ including T n ðωÞ refers, instead, to a signal that is shifted in time with respect to the electron beam and is not amplified. It will be present as a prepulse, and its effect may need to be considered depending on the experiment, hence the term will be eventually dropped. In any case, in order for the HXRSS to work properly, the final stage 5 in Fig. 3 must be chosen in such a way to provide a much stronger amplification than in stage 1 or stage 3 (see also Fig. 9). Hence the nonamplified terms will make a small addition to the final signal.
Starting from the following undulator part, however, shot noise P n ðωÞ is present. This assumes that the microbunching due to the previous SASE process is completely washed out by the passage through the first magnetic chicane, due to dispersion: a justification of this assumption will be given in the next section. The bandwidth σ SASE is then amplified together with the seed in the next second stage with G 2 ðωÞ ≃ G 1 ðωÞ ¼ G l ðωÞ being the FEL Green's function for the second part of the undulator, where lasing still takes place in the linear regime. Here we assume that jG 1 ðωÞj and jG 2 ðωÞj are comparable. This is possible because lasing is still in the linear regime and we can assume, at least in the first approximation, that the electron beam does not change its characteristics. In this way, the magnitude of the amplification only depends on the length of the two undulator parts, giving us a way to easily balance heat load on the two crystals.
The radiation spectrum at the second crystal is composed of several contributions. The electron beam shot noise develops into a signal P n ðωÞG 2 ðωÞ while the seeded signal P n ðωÞG 1 ðωÞT s ðωÞ is amplified giving P n ðωÞG 1 ðωÞT s ðωÞG 2 ðωÞ. These signals are filtered by the second crystal, which is identical to the first. Also, the crystal transmission T n is typically close to unity except for a narrow frequency band where the Bragg/Laue condition is satisfied. This means that spectra and SNR just before the second crystal can be written as Note that the SNR of the photon spectral density is given around the frequency of the seed signal, while the crystal transmission function T n ðωÞ is about zero for ω ≃ ω seed .
Here we defined the SNR after the first crystal as the ratio between the spectra of signal and noise. This definition depends again on the frequency ω. In the simulations presented in the following sections we will actually extract a single number as a figure of merit, defined as the ratio in Eq. (2), at the frequency where it assumes its maximum value.
Taking into account that transfer functions of the seed signal and the rest of the radiation do not overlap significantly, i.e., T s ðωÞT n ðωÞ ≃ 0 (see Fig. 4), the spectrum after the second crystal becomes As we can see from Eq. (3) there are four different output terms arising by multiplying the two input terms by T s ðωÞ þ T n ðωÞ. The second input term P n ðωÞG 2 ðωÞ, once multiplied by T s ðωÞ, yields a small seed contribution with respect to the first input term multiplied by T s ðωÞ, which is the new seed signal P n ðωÞG 1 ðωÞT s ðωÞG 2 ðωÞT s ðωÞ. This fact will be clearer later. The other terms, multiplied by T n ðωÞ are, again, not amplified further, because of the time-windowing operation inherent to the single-crystal monochromator technique. After the last, much longer amplification stage with jG 3 ðωÞj ≫ jG l ðωÞj the final radiation spectrum and the SNR are as follows: It is reasonable to require that the FEL amplification is performed in such a way that the energy in the SASE pulse after the first undulator part, whose spectrum is P i1 ðωÞ ¼ P n ðωÞG 1 ðωÞ, is roughly equal to the energy in the seeded pulse after the second undulator part, whose spectrum is P i2 ðωÞ ≃ P n ðωÞG 1 ðωÞT s ðωÞG 2 ðωÞ. This simply means that we share the heat load on the two crystals in an even way. Then, we can roughly estimate the energy related to these two terms and equate them as P nḠ 2 lTs σ seed ∼P nḠl σ SASE , where nowḠ l ,T s andP n are appropriately chosen numbers, which are related with the average values of the functions G l ðωÞ, T s ðωÞ and P n ðωÞ. The previous equality can also be written as This equation can be used to estimate the optimal gain in the first two FEL amplification stages.
Note that the estimation in Eq. (5) allows to drop the first term in the signal-to-noise ratio in Eq. (4), so that In the simulations presented in the next sections, we will rely on the previous definition of signal-to-noise ratio in order to extract a figure of merit and to compare numerical with analytical results. However, we remark here that one can also introduce an alternative notion of a signal-to-noise ratioS=N in terms of the total number of photons within the respective signal and noise bandwidths. After the second and the third amplification stages, signal and noise have bandwidths σ seed and σ SASE , respectively. Keeping in mind Eq. (5) we havē Note that the SNR in terms of peak photon spectral densities is higher since the photons corresponding to noise are distributed over significantly larger bandwidth σ SASE . The fact thatS i2 =N i2 ∼ 1 means that the seed and the SASE pulse have roughly the same energy before the second crystal. However, the ratio betweenS 3 =N 3 andS i2 =N i2 remains σ SASE =σ seeded .
The previous reasoning shows that the addition of a second chicane increases the SNR by a factor given, roughly speaking, by σ SASE =σ seeded . It should be remarked here that repeating the process a third time would not increase the SNR further. At variance with the previous results, we now would start with narrow-bandwidth signals σ seeded ≪ σ SASE , which brings no SNR increase, while the radiation power impinging on the third crystal would be approximatelyḠ 3 ≫ 1 times higher than on the second one.
Note that if, keeping all other parameters equal, one chooses to retract the first self-seeding crystal, then the spectrum of the signal impinging on the second crystal would be P i2 0 ðωÞ ¼ G 1 ðωÞG 2 ðωÞP n ðωÞ with total number of photons being higher approximately by a factor σ SASE =σ seeded . The total spectrum, peak and integrated photon spectral density SNR after the final seed amplification would be Intuitively, one can increase the SNR by impinging with a larger SASE pulse onto the crystal. However, one cannot increase indefinitely the first part of the undulator. Although the signal level is higher compared to the previous case [note the factor T s ðωÞ], it becomes evident that the ratio between the FEL gain in the now-first amplification stage G 2 l ðωÞ and the final stage G 3 ðωÞ should be carefully chosen for the desired trade-off between good SNR and high flux of the signal.
It is important to remark that the increase in the SNR does not yield an increase in the final output power. The importance of having a large SNR unfolds in the linear regime, where a large contrast between seed signal and noise is needed for the seeded pulse to prevail against the competing SASE process, yielding a clean seeded output.
We will discuss how the SNR increase can be exploited at the European XFEL to achieve a decrease in the incident power level at the crystals, contrarily to the conclusions in [4][5][6] for the LCLS, in the following sections. However, it should be remarked here that, in general, a better signal-tonoise ratio may also be exploited to the benefit of the final characteristics of the radiation pulse because, again, it is related to the generation of a clean seeded output.

C. Heat-load issues
There are two sources of heat load on the crystals: one is related to radiation around the undulator resonant frequency (both SASE and seeded signals) and the other one to the broadband spontaneous emission. One of the benefits of a higher SNR of the two-stage self-seeding is that it allows to decrease the power level needed at the crystal position, and therefore the crystal heat load caused by FEL radiation. However, one still needs to confront with the spontaneous radiation heat load, which is always present and is nearly independent of the fundamental tune. Since the heat load due to both SASE and seed heavily depends on the wavelength, the relative importance of the two contributions also depends on the wavelength.
Partial absorption of x-ray pulses during their propagation through or reflecting from an x-ray optic component introduces thermal strain fields [13]. The thermal strain may even have wavelike properties in case thermal stress is generated quickly. A thorough theoretical analysis of x-ray matter interaction lies beyond the scope of this article. For silicon-based x-ray optics and a soft-energy FEL pulse one can find discussions on the governing physics in e.g., [14,15] and references therein. As a starting point, we have modeled the case numerically. We have solved the heat conduction equation coupled with elasticity equations for the case of a 3 μJ, ð40 × 40Þ μm 2 FWHM Gaussian pulse passing through a 100 μm-thick diamond plate with the diameter of 3 mm for different photon energies: 3.5, 5, 8.3, and 12 keV. We used two models: quasicontinuous heating (results are shown in the Fig. 6) and pulsed heating (results are shown in Fig. 7). In both models the elastic waves are accounted for via time-dependent thermoelastic temperaturestress coupling, see [16,17]. In the quasicontinuous heating model the FEL train is modeled as a constant heat source that lasts 220 μs while in the transient model it is assumed that each SASE pulse introduces an instant temperature field jump that acts as a source of thermal stress that, in turn, decays on the timescales of heat dissipation. To be able to compare the results we used the rate of energy deposition of 3 μJ in 220 ns for the cw case corresponding to 3 μJ pulse energy for the pulsed heating case. In both cases the heat energy distribution in the crystal volume was calculated according to the Lambert-Beer light attenuation law with energy-dependent attenuation lengths of diamond. A detailed analysis of the numerical computation and the justification of the used models will be discussed in a separate article.
Here we will limit ourselves to consider the effect of the peak thermal strain extracted from our calculation on the central frequency of the crystal reflection. Considering a 100 μm-thick diamond crystal, C400 reflection at 8.3 keV, one finds a Darwin width of 11.4 μrad, corresponding to 1.36 × 10 −5 relative bandwidth. Considering a thermal expansion of 1.2 × 10 −6 K −1 of room temperature diamond, we find that a temperature increase of about 11 K will result in a frequency shift of a Darwin width. Therefore, also confirmed by simulations (see Fig. 6), 1000 radiation pulses at 8.3 keV and 4.5 MHz (the maximum intratrain repetition rate of the European XFEL) each carrying 3 μJ of energy will shift the central frequency of the crystal reflection by about a Darwin width. At a rate of 24% deposited energy, this corresponds to about 0.7 μJ deposited energy, which goes into heat. Since typical seeded bandwidths are at least several times wider than the Darwin width, it follows that one may tolerate several microjoules of deposited energy, i.e., 3 μJ incident energy per pulse is a very conservative estimate. However, at 4 keV, about 73% of the energy is deposited, and the amount of energy tolerable is reduced by a factor of 3. At the minimum possible seed energy, 3.3 keV, the situation is much worse, with about 90% of the incident energy actually deposited. Possible solutions would be to work at a lower liquid nitrogen temperature at which thermal expansion of diamond is a factor of 10 lower, thus reducing the strain fields magnitude likewise.
The exploitation of a two-chicane scheme allows one to reduce the contribution to the crystals heat load due to the FEL signal of about 1 order of magnitude. For comparison, it was previously calculated by some of us that, for bunches with 100 pC charge at 17.5 GeV, eight undulator segments would yield a spontaneous radiation contribution of about 6 μJ deposited energy (albeit in a much wider area), which cannot be reduced with the two-chicanes method. While the effects of the spontaneous contribution to heat load are comparable to those due to the seeded contribution at 8 keV, the seeded contribution becomes the main one at lower energies, justifying our choice of a two-chicane scheme just for heat-load relief.
In the following parts of this paper, aside from confirming the effect of SNR increase by simulations, we will focus our attention on two main questions: first, where is the optimal location for the HXRSS monochromators along the SASE2 undulator and, second, what performance we can expect from a two-chicane HXRSS setup, where we allow a certain maximum pulse energy to impinge on the crystals. We studied these questions assuming, for our start-to-end simulations, that the European XFEL performs according to the nominal specifications. Advanced FEL techniques are actually relying on this assumption and while the European XFEL is still at the early stages of operation, its performance, in time, will approach the nominal one. As a tool for our investigations we mainly used the software toolkit OCELOT.

III. SOFTWARE DEVELOPMENT
OCELOT [18] is a PYTHON-based multiphysics software package for design and simulations of storage rings and FEL sources developed at the European XFEL and DESY, and freely available to the scientific community. OCELOT can be freely downloaded at [19], where the interested reader can also find a guide to the installation, and application examples. Its architecture is based on the use of several modules, in particular to interface to GENESIS code [20], to calculate the dynamics of charged-particle beam, and dynamical diffraction in diamond crystals. These modules provide a convenient framework for simulating advanced FEL schemes like self-seeding. The input needed by GENESIS is defined in a simple way using PYTHON. The preprocessor then generates the correct input deck, and simulations are automatically launched. The GENESIS output is loaded into PYTHON objects, which can be manipulated at will. In our case, one first simulates SASE, and after that the OCELOT optics module is used to compute the modulus and phase of the transmission function through given crystal planes. The GENESIS output files are filtered and a new input for GENESIS is generated. The impacts of the various FEL processes on the electron beam are taken into account in the energy and energy-spread profiles fed at the entrance of each undulator segment. Resistive undulator wakes are also accounted for.
The electron beam is not tracked directly through the chicanes, under the assumption that the microbunching is fully washed out by dispersion. We remind the reader that this assumption is satisfied in all cases of practical interest, and was broadly used from the inception of the HXRSS technique with single-crystal monochromators [1]. To illustrate this point, consider for example the longest wavelength under consideration, λ ¼ 0.38 nm, corresponding to about 3.3 keV and to the largest separation between diamond crystal planes. We remind that the relation between the R 56 factor of the chicane and the path delay introduced by the chicane δL is about R 56 ≃ 2δL. For the European XFEL, the relative rms energy spread in the electron beam (with the laser heater off) is in the order of σ Δγ=γ ≃ 10 −4 . Even optimistically assuming a delay of δL ¼ 5 μm, the microbunching level is multiplied by a suppression factor, due to energy spread, which is estimated as [21] exp this gives a clearly negligible surviving microbunching. Based on the previous assumptions, a simple PYTHON script was created, which simulates the propagation of electrons and radiation through a complex HXRSS setup. At each simulation stage the output can be collected into graphical form. The use of OCELOT combined with a high-performance cluster allows for statistical runs with different shot-noise conditions. During simulations six nodes (64 processors each) were used on the Maxwell highperformance computing cluster [22]. A typical single-shot HXRSS simulation, with a transverse mesh of 201 by 201 points and about 2500 longitudinal slices (at 3.5 keV) takes a few tens of minutes to run. It is composed by five stages, consisting of three FEL runs and two filtering stages, followed by data postprocessing.

IV. SIMULATIONS
In this section we focus on the simulation of specific operation points in order to answer questions concerning performance and configurations of the setup.
First, we fixed an upper bound to the energy at 14.4 keV. This limit is set somehow arbitrarily, because the performance of the HXRSS technique degrades with higher energies, but there is not a sharp limit. Our upper bound obviously refers to the Moessbauer line, which is halfway to the maximum nominal saturation point of the European XFEL (around 25 keV). If the target energy is higher than 14.4 keV one can resort to a hybrid technique where seeding and amplification is performed at a subharmonic of the target frequency, and the electron bunching at the target frequency is exploited in a final part of the undulator, owing to the tunable K parameter at the European XFEL. This technique was previously proposed and examined in [9] and more recently, for soft x rays, in [23]. We refer the interested reader to these publications for more details.
In contrast to the upper bound, the lower possible bound is fixed by the structure of the diamond crystal, which has reflection lines down to about 3 keV, corresponding also to the minimum energy achievable at SASE2. We then fixed a photon energy of 3.5 keV for a low-energy study.
The HXRSS technique is naturally more sensitive to the electron beam phase space, compared to the usual SASE. In particular, nonlinear energy chirp lead to deviations from the Fourier limit. Therefore, as a first step, we used electron beam distributions from start-to-end simulations, optimized for HXRSS performance [8].
In Fig. 8 (see [8]) we plot the optimized electron beam distribution for a charge of 100 pC before the undulator. In the present work we will fix the electron beam distribution as in Fig. 8. Different choices for the electron bunch charge are possible, assuming the same peak current. Clear advantages in considering low charges are: smaller emittance, smaller resistive wakefields, lower heat load related with the spontaneous radiation (which scales linearly as the charge), as well as a smaller spatiotemporal coupling [24,25] (assuming the same transverse size of the electron beams). Disadvantages include a larger Fourierlimited bandwidth and a higher resolution needed to diagnose the longitudinal phase space of the electron beam, if such diagnostics is available.
The capability of reproducing and controlling the longitudinal phase space distribution of the electron is of fundamental importance in order to achieve efficient seeded operation. To this end, dedicated measurements are being performed at the European XFEL and will be reported elsewhere.

A. High-energy analysis
We began our simulation studies by considering the upper photon energy of 14.4 keV, to be generated at the SASE2 undulator at the European XFEL. The undulator consists of 35 segments of 5 m magnetic length each, with a period of 40 mm, and we assume the electron beam distribution described in Fig. 8, left. The 14.4 keV energy point is the most demanding, in our selection, in terms of electron beam quality, and at higher energies one needs to have longer undulator parts between the chicanes. Therefore, optimizing the HXRSS in the case of 14.4 keV automatically answers the question of the chicane placement in the overall setup.
We proceeded by optimizing the output of a doublechicane HXRSS setup at saturation in the case the two undulator stages preceding the magnetic chicanes are formed by seven undulator segments (we call this case "7 þ 7"), by eight undulator segments (we call this case "8 þ 8") and by nine undulator segments (we call this case "9 þ 9").
For these simulations we used 100 μm thick diamond crystals and a symmetric C400 reflection.
As concerns heat load due to the FEL pulses, we estimated before that several microjoules of deposited energy per pulse should be acceptable at the high repetition rates of the European XFEL. Given the fact that only 3% of the energy is actually deposited on the crystals at 14.4 keV, we see that these accepted energies correspond to impinging energies per pulse of about 100 μJ. This large value would both fit a single-chicane HXRSS setup and a doublechicane HXRSS setup. For example, in the 7 þ 7 case, the energy impinging on the first crystal is about 2 μJ, in the 8 þ 8 case raises to about 8 μJ and in the 9 þ 9 case raises only to 20 μJ. However, here we consider a double-chicane setup. We discuss the advantages of our choice in the following sections. We should also note that a fully symmetric scheme does not distribute the impinging pulse energies evenly on the two crystals. For example, in the case of a 7 þ 7 setup, the impinging energy on the second crystal is about 9 μJ per pulse. We checked separately that an even distribution of the impinging energy (on a "7 þ 6" setup, in this case), leads to similar output results, but in the following we discuss the symmetric case. The choice of a symmetric setup will enable further applications, like e.g., pulse autocorrelation measurements [26,27]. We also considered the case of a 6 þ 6 setup. In this case the seed signal turns out to be too low (10 4 -10 5 W, 1 order of magnitude lower than the 7 þ 7 setup) to be successfully amplified, and after the two chicanes the signal level is negligible, i.e., one obtains only SASE. Therefore, we dismissed this possibility.
In Fig. 9 we plot the maximum on-axis spectral density, the radiation power and the averaged spectral density at the exit of the self-seeding setup at saturation, i.e., after the two seeding chicanes followed by ten undulator segments for the three cases 7 þ 7, 8 þ 8 and 9 þ 9. All plots refer to an ensemble average over ten events. Note that in this and all the following figures, the spectral data indicated with the denomination "on-axis" are analyzed using the on-axis FIG. 9. Maximum on-axis spectral density versus position inside the last radiator with prepulse contributions removed (left). Radiation power (middle) and averaged spectral density (right) at saturation after 10 undulators, for different configurations 7 þ 7, 8 þ 8 and 9 þ 9. All plots are the results of an ensemble average over ten events. spectrum from GENESIS. In this case, normalization is chosen in such a way that the integral over energy gives back the total number of photons in the pulse. This normalization is useful under the assumption, typically verified, of a good transverse coherence of the FEL pulses.
As one can see, the 7 þ 7 option is characterized by the highest radiation power, and in the 8 þ 8 and 9 þ 9 scenarios, the prepulses from stage 3, which are separated from the main pulses from stage 5 by the chicane delay, are large compared to the final twice-filtered pulse. Moreover, looking at the spatially averaged spectral density one sees that the maximum value is comparable in all cases. The conclusion from this analysis is that the 7 þ 7 case is the best performing one. These results can be interpreted on the basis of our considerations in Sec. III B, where (albeit dealing with a single chicane) we made the point that by increasing the length of the first undulator part the signal can be increased at the expenses of a larger incidence power on the crystal, and of a more important impact on the electron beam quality. Figures 8 and 9 show that, as we increase the length of the setups from 7 þ 7 to 9 þ 9 the SNR increases but the electron beam quality deteriorates, and the output flux decreases (while a large prepulse appears). In the 9 þ 9 case the beam quality is spoiled more than in the 7 þ 7 case, hence at the entrance of the last radiator we start with a much larger energy spread. Due to the bad electron beam quality, we can extract less energy from the last radiator in the 9 þ 9 case, as it is evident from Fig. 9 (middle plot). So the beam is heated up more [ Fig. 8 (right), red curve compared to blue curve] with respect to the 7 þ 7 case [ Fig. 8 (middle), red curve compared to blue curve], but the FEL process is less efficient. No tapering profile is implemented, however, the K parameter is adjusted in the last radiator (stage 5), to account for the different average electron energy loss in the 9 þ 9 case.
With our setup choice fixed, we analyzed emittance and energy spread tolerances for the 7 þ 7 configuration. Figure 10 shows our results again for the maximum onaxis spectral density, for the radiation power and for the averaged spectral density at the exit of the self-seeding setup, around saturation, with different emittances. At variance with Fig. 9, all plots in Fig. 10 (and the following Fig. 11) refer to a single-shot event, because the same initial noise conditions can be used in order to enable a fair comparison with different cases. This also explains the different saturation length with respect to Fig. 9. The main conclusion is that, while the saturation point shifts when the emittance is increased, the output peak power remains the same, even though the pulse becomes dirtier, and a decrease in the maximum spectral density is observed. However, this decrease is mild, meaning that degradation of the electron beam emittance by few tens of percent can still be accepted.
We then moved on considering the effects of an increased energy spread, compared to the nominal value, on the HXRSS performance. Again, we fixed the 7 þ 7 configuration and, as before, we plot the maximum on-axis spectral density, the radiation power and the averaged spectral density around saturation, with different values of the energy spread (see Fig. 11). Also in this case, all plots refer to a single-shot event. Qualitatively one can see that the saturation point shifts. However, up to an increase of the energy spread to about 500%, no significant changes are observed in the radiation power or spectral density.
Our study indicates that the best placement of the two self-seeding chicanes is according to the 7 þ 7 scheme, which is the best performing. However, in order to allow for deviations from the nominal FEL performance, we suggest to add one extra segment in each undulator part, and to configure the double-chicane HXRSS setup at the European XFEL as in the 8 þ 8 configuration, at least during the commissioning period. This solution may also be helpful in case actual deviation of emittance and energy spread from the nominal value require a longer first and third stage.

B. Low-energy analysis
The analysis of the 14.4 keV high-energy case allowed us to optimize a double-chicane HXRSS setup for the European XFEL, fixing the 8 þ 8 configuration. At lower photon energies less undulator segments are needed to successfully seed. This means that some of the undulator segments in the 8 þ 8 can be open, as we remind that SASE2 consists of variable-gap undulator segments. A detailed study of a case for intermediate energies around 9 keV can be found e.g., in [28]. There, a different charge (250 pC) was used, and it was shown that a 5 þ 5 setup can be used to obtain nearly Fourier limited TW-class beams with 0.94 eV FWHM bandwidth, using a C400 reflection from a 100 μm crystal.
In this section we discuss the expected performance of our HXRSS setup near the lowest possible photon energy that can be obtained by the available crystal reflections. We therefore fix 3.5 keV as photon energy, and we simulate the self-seeding process for a C111 symmetric reflection, using a 50 μm thick crystal. Four undulator segments are found to be sufficient for seeding. We therefore consider a 4 þ 4 configuration, which can be effectively obtained from the 8 þ 8 configuration by opening undulator segments as needed. The electron beam charge is, as before, 100 pC, while its energy is reduced to 8 GeV. The final output is depicted in Fig. 12.
At 3.5 keV, an average FEL pulse hitting the second crystal at stage 4 in the 4 þ 4 case would have an energy of about 2.2 μJ, corresponding to a deposited energy of 2 μJ, which is about 90% of the incident power. A lower incident power must be accounted for at the position of the first crystal, which is about 1 μJ. We remark that it is not possible, in this case, to distribute the heat load more evenly because at these low energies the power gain length is shorter than a single undulator segment. It should also be noted that the possibility to seed with such a low energy, while keeping a high SNR, is due to the exploitation of the two-chicane scheme: in fact, as we will confirm later in this section, the double-chicane scheme allows for an increase in the signal-to-noise ratio of about a factor of 10. The reason for this increase was discussed in detail in the previous sections. Here we only remind that this effect is due to the fact that at the second crystal one deals with an almost Fourier-limited pulse. Finally, we underline the fact that closing more undulator segments (e.g., five) in the first two stages would not be acceptable due to heat-load limitations. In order to get a precise quantification of the SNR increase related to the two-chicanes scheme, we studied the ratio between the peak spectral densities in seeded configuration and in the usual SASE mode. In other words, we define the SNR as We compared the SNR in the case of the two-chicane setup (4 þ 4 segments) and in the case of a single-chicane setup (four segments) just in front of the radiator undulator.
In Fig. 13 we show an example of the on-axis spectral density for one SASE pulse realization and the corresponding seeded case for the two-chicane setup and for the single-chicane setup just before the final radiator. For the two-chicane setup, we account for SASE emission starting after the first crystal, i.e., consistently with our previous analysis, we do not consider the prepulse due to the first four segments. The peak SASE spectral density amounts to about 4.5 × 10 8 ph=eV, while the seeded case yields about 2.2 × 10 11 ph=eV. The ratio between the latter and the former gives a SNR ≃ 500. For the single-chicane setup, the peak SASE spectral density amounts to about 1.3 × 10 8 ph=eV, while the seeded case yields about 5.5 × 10 9 ph=eV, giving a SNR ≃ 50. In this particular illustration, the two-chicane schemes results in about a factor of 10 increase in SNR.
When calculating the SNR we take the ratio of the seeded pulse spectral density to the SASE pulse spectral density, which is nearly independent of the particular shot taken.
Finally, we studied the evolution of the SNR of the twochicane setup (A) and single-chicane setup (C) and its ratio (A/C) along the radiator. Our result is plotted in Fig. 14, where the SNR is calculated out of an ensemble of ten pulses. The SNR ratio (A/C) remains almost constant (about a factor 10) up to saturation. Note that the large values around 10 4 for the A and B curves at the beginning of the amplification process in Fig. 14 are due to a   FIG. 13. Comparison between SNR in the two-chicane and in the one-chicane case. On the left we show the on-axis spectral density of one SASE pulse realization (left top) and the corresponding seeded case (left bottom) for the two-chicane setup just before the final radiator. In this case SNR ≃ 500. On the right we show the on-axis spectral density of another SASE pulse realization (right top) and the corresponding seeded case (right bottom) for the one-chicane setup at the maximum SNR point (around 25 m, see Fig. 14). In this case SNR ≃ 50. The gain in SNR is about a factor of 10. numerical effect: at the beginning of the radiator, when one starts from shot noise, there is no radiation at the very beginning of the amplification process. Therefore, when calculating the ratio between signal and noise in curve A or B at the very beginning of the undulator one effectively divides by a small number. At variance, in the previous Fig. 13 one deals with SASE and seeded signals at the maximum SNR point (around 25 m, see Fig. 14), and in the analytical estimations, the noise is already translated into an equivalent power.
We can further compare results obtained in Figs. 13 and 14 with the analytical estimates in Sec. II B. In that section we actually introduced two definitions of SNR, one based on the total number of photons within the respective signal and noise bandwidths, and another based on the (frequency-dependent) ratio between signal and noise, calculated around the seed frequency. It is this last definition that is to be compared to our numerical results, where one takes the maximum of the frequency-dependent SNR ratio. Then one expects from Eq. (4) that the SNR along the last radiator is given by G l T s ð1 þ G l T s Þ. It can be estimated with the help of Eq. (5), which estimates G l T s as the ratio of the SASE to the seeded bandwidth. Such a ratio is typically about a factor of 10, which leads to an analytical estimate of the SNR of order hundred for the two-chicane scheme and of about ten in the case of a single chicane, see Eq.
(2), where the SNR is estimated as G l T s . These numbers can be compared with the single-shot plot in Fig. 13, or with Fig. 14 curves A and B (in the linear regime), all in agreement with the analytic estimates.
After saturation, the SNR drops due to the fact that the single-chicane setup saturates further downstream compared to the two-chicane setup, due to the lower seed level. One can, of course, change the value of the K undulator parameter in the two-chicane setup to keep up with the change of electron energy (B). Even a simple reduction of the value of K (without an optimized tapering profile, whose study is outside of the scope of this manuscript) is enough to keep the SNR almost constant beyond the saturation point, see Fig. 14 (B and B/C).

V. DISCUSSIONS AND CONCLUSIONS
In this paper we investigated and optimized the overall performance of the forthcoming HXRSS setup at the European XFEL. Our conclusions indicate that a doublechicane HXRSS setup is a convenient choice for the SASE2 FEL at the European XFEL, which is made of a total of 35 five-meter-long modules 2 with a period length of 40 mm. The optimized configuration consists of a first eight-modules-long undulator part, a chicane with single crystal monochromator, a second eight-module-long undulator a second single crystal monochromator and a final radiator.
Our indications rely on the study of one high and one low photon energy cases (14.4 and 3.5 keV respectively), leaving postsaturation studies and investigations on microbunching instabilities as future work.
In the 3.5 keV case we explicitly showed the advantages of a double-chicane scheme, compared to a single-chicane scheme, in terms of a larger signal-to-noise ratio (SNR). The notion that a double-chicane setup yields advantages due to an increased SNR was criticized in [4][5][6]. Here we demonstrated both by theoretical reasoning and simulations (see Fig. 14) that a double-chicane scheme actually yields a larger SNR as expected in those papers. We should stress that independently of the SNR, the power extracted at saturation is always related to the value of the FEL parameter. In other words, in first approximation, 3 given a certain electron beam energy and a certain value of the FEL parameter, the power extracted at saturation is independent of the SNR. However, in the case of a large SNR the power at saturation goes into the seeded signal, while for a small SNR it mainly goes into SASE.
It is possible to keep the SNR stable after saturation by properly changing the undulator K parameter, see Fig. 14. The single-chicane setup, instead, saturates further downstream compared to the two-chicane setup, due to a lower seed level and SASE starts growing later on. Still, the final output quality is superior in the case of a double chicane, because it starts with a larger SNR. We have seen that a larger SNR allows for smaller impinging powers on the crystals, allowing for high-repetition rate due to decreased heat load.
In the low photon-energy case we concluded that an optimum setup would be made out of a 4 þ 4 configuration. Since the European XFEL is equipped with undulators with variable gaps, it is possible to install the chicanes further downstream and still enable a 4 þ 4 configuration. With this in mind, we optimized a double-chicane setup in the high-energy case as well, arriving at an optimum configuration of 7 þ 7, which we then suggested to change to 8 þ 8 in order to cope with possible discrepancies of the actual beam with respect to simulations.
In our analysis of the high-energy case we pointed out that a single-chicane setup with nine segments would still correspond, at 14.4 keV, to a very modest increase of the energy deposited, per pulse, onto the crystal. One might consider, therefore, a scheme where the two chicanes are physically placed in a 4 þ 4 or 5 þ 5 configuration. This choice would enable a double-chicane setup at lower energies, where a large SNR ratio is of fundamental importance for decreasing the heat load on the crystals, and at the same time it would be compatible with a singlechicane scheme at higher energies. However, a much greater flexibility induces us to rely on the 8 þ 8 choice for the HXRSS configuration at the European XFEL.