Advances in numerical methods under wiggler period averaging for free electron laser simulation

Advances in numerical methods for free-electron-laser (FEL) simulation under wiggler period averaging (WPA) are presented. First, WPA is generalized using the perturbative Lie map method. The conventional WPA is identified as the leading order contribution. Next, the shot-noise model under WPA is improved along with a particle migration scheme across the numerical mesh. The artificial shot noise arising from particle migration is suppressed. The improved model also allows using arbitrary mesh size, slippage resolution, and integration step size. These advances will improve modeling of longitudinal beam profile evolution for fast FEL simulation.


I. INTRODUCTION
The FEL simulation codes under the standard approximation that includes slowly varying envelope approximation (SVEA) and wiggler period averaging (WPA) have been extensively used in past decades and still are widely used, and considered to be effective and efficient [1]. In efforts to improve FEL simulations, reliability issues of standard approximation were raised and most newly developed codes were built without such approximations at the cost of heavy computational loads [2][3][4]. However, the design optimization of FEL requires extensive start-toend simulations. As a result, there are still high demands on codes under the standard approximation. Indeed, most of the start-to-end design codes choose to incorporate the FEL simulation code with the standard approximation [5][6][7][8][9]. Therefore, efforts toward improving codes under the standard approximation are desired. In this paper, we present several advances in numerical methods under the WPA and the SVEA.
The SVEA assumes slow field envelope variation temporally and longitudinally compared with the resonant frequency ω r and wavelength λ r . These conditions are well met in most typical FEL parameters [1], and was shown to remain valid even in the case when the seed violates these conditions [10]. The WPA presumes stronger condition because it asserts small change in the field envelope and particle bunching factor over wiggler period length scale longitudinally. In this paper, we generalize the WPA using the perturbation Lie map method. The conventional WPA is identified as the leading order contribution. The next order corrections are coupling between betatron and wiggling motion, longitudinal field envelope variation, and transverse field gradient effect for even harmonics.
Many implementations of the standard approximation limit the macroparticle migration across the numerical mesh, especially the temporal mesh called slice [11]. This can reduce the simulation data size as well as the computation time. However, such limitation may fail to model the evolution of temporal electron bunch profile, self-consistent wakefield and space charge [12]. An intuitive and straightforward implementation of particle migration can lead to large artificial shot noise due to the nature of the particle loading methods used in many FEL codes [13,14]. In this paper, we also present an improved particle loading method for shot-noise modeling and a migration scheme compatible with the loading method.
The proposed particle migration scheme allows using arbitrary weight and shape functions. Here, the weight function refers to the integral kernel used for particle deposition on numerical mesh points. The shape function refers to the shape of the field representation at numerical mesh points used for field interpolation from mesh points to particles' coordinates. The proposed particle migration scheme also allows using arbitrary mesh size, slippage length and thus arbitrary integration step size independent of the temporal mesh size called slice length. The arbitrary slippage resolution is important in modeling correct slippage especially for nonresonant transport line other than wiggler. The arbitrary slippage resolution further allows applying the operator split-composition method [15] on a field solver to increase numerical accuracy.
The particle loading method can naturally accept the particle data from upstream simulation enabling start-toend simulation seamless. In addition, it enables coherent spontaneous emission (CSE) to be simulated within the resonant band.
The organization of this paper is as follows: After the introduction, we present the WPA generalization in Sec. II, particle loading and compatible migration scheme in Sec. III, implementations of arbitrary slippage resolution and split of slippage operator in Secs. IV, and V respectively, and an illustration of CSE modeling in Sec. VI followed by the conclusion in Sec. VII.

II. GENERALIZATION OF WIGGLER PERIOD AVERAGING
A. Review: Perturbative Lie map The Lie map of a Hamiltonian system is a time evolution operator acting on canonical variables of the Hamiltonian. The Hamiltonian equation can be written as, where the square bracket is the Poisson commutator, f is an arbitrary function of canonical variables, and z is the independent variable. It can be shown that the time evolution operator HðzÞ from z ¼ 0 to z ¼ z is GðzÞ ¼ − Here, G is the generator serving as a Poisson operator. The expansion in the generator is called Magnus' series [16]. However, in general, Magnus' series does not converge when the bare Hamiltonian HðzÞ is the integrand. Consider HðzÞ ¼ S þ VðzÞ, where S is an autonomous Hamiltonian and VðzÞ is a small potential. Then, it can be shown that the map can be factorized into the unperturbed map S and the perturbed map V such that HðzÞ ¼ SðzÞVðzÞ where the generator of the perturbed map is Note that the integrand is propagated by the unperturbed map S. The propagated potential SV is called the interaction potential [16].

B. Integration over one wiggler period
In general, a perturbative Lie map is built in order of small parameters for convergence. However, in wiggler, if we build a map over one wiggler period, the wiggling motion is integrated out leaving small coupling effects between the fast wiggling motion and slow motions like betatron motion. This idea enables us to generalize WPA with perturbative Lie map formalism.
We split the Hamiltonian into H ¼ S þ FðzÞ þ VðzÞ where S is the wiggler period averaged Hamiltonian representing slow motion, V is the radiation field potential, F is the rest of the Hamiltonian which represents the fast wiggling motion. Accordingly, the Lie map is factorized into slow map S, fast map F and field potential map V such that where λ u is the wiggler period and with the interaction potentials KILEAN HWANG and JI QIANG PHYS. REV. ACCEL. BEAMS 21, 120702 (2018) 120702-2 We truncate the field potential generator G V at the first sequence of the Magnus's series because the radiation field strength is much weaker than the external magnetic field strength. For example, using LCLS-II like parameters of energy 4 GeV, wave length 4 Å, radiation size 0.23 μm and normalized wiggler strength K ¼ 1.44, the ratio between the normalized radiation strength and the wiggler strength is about 10 −5 when the power is 100 GW. Note, in Eq. (7), although the integrand SF is not small, the fast wiggling motion is integrated out when the generator G F is evaluated over one wiggler period leaving the small coupling effect between slow and fast motion due to the propagation by S.

C. The leading and the next leading order
If we calculate the Lie map generators G S ¼ −zS and Eqs. (7), (8) to the leading order, we can recover the conventional WPA. In order to generalize it, we need to include the next leading order terms. Table I roughly describes our definition of the leading the next leading order of the generators. Note that, if the F is used instead of SF as an integrand for G F , the fast map will become identity when integrated over one wiggler period. Similarly, when F V is used instead of SF V as an integrand for G V , the slow drift on FEL phase will be ignored. In addition, the first order of field envelope gradient will be considered in the next leading order contributions.

D. Hamiltonian
We start from the following Hamiltonian where ct is the light speed flight serving as the longitudinal canonical variable whose canonical momentum pair is negative of the normalized energy −γ. We assume the ideal planar wiggler model for the normalized vector potentials a x ¼ K cosh ðk x xÞ cosh ðk y yÞ cos ðk u zÞ þ a r a y ¼ K k x k y sinh ðk x xÞ sinh ðk y yÞ cos ðk u zÞ ð 11Þ where k u is the wiggler wave number, k x and k y are the natural focusing strength of the wiggler, K is the normalized wiggler strength, and a r is the radiation vector potential which we write in harmonic decomposition, Here, K h is the radiation envelope of the hth harmonic mode, and k r is the wave number of the fundamental mode. Using the following generating function, we define the new conjugate variables and the new Hamiltonian.
Then, we expand and split it into slow, fast and field potential parts.
where q ⊥ ∈ fk x x; p x ; k y y; p y g and is the effective wiggler strength.

E. Fast map
After integration over one wiggler period, the nonzero leading order term of Eq. (7) becomes This corresponds to the coupling between slow betatron oscillation and fast wiggling oscillation. It is negligible in most cases as it scales as γ −2 compared to the wiggling Leading order Next leading order (16). Physically the smallness is due to the large frequency ratio between the betatron and wiggling oscillation. Therefore, such coupling can be more relevant when a strong focusing quadrupole present on top of the wiggler field.

F. Field potential map
From Eq. (17), the interaction field potential can be written as, where the real value operator and the summation over h is assumed for simplicity, K int h ðzÞ is the interaction field envelope and θ int ðzÞ is the interaction FEL phase propagated by the slow and fast map SðzÞF ðzÞ. We neglected the propagation on terms in the square bracket of Eq. (20) as they are less significant compared with the propagation on field envelope and phase.
The interaction FEL phase is The 2nd term of Eq. (21) is a drift of FEL phase that comes from the slow map SðzÞ. The 3rd and the 4th term of Eq. (21) are the longitudinal wiggling motion. The 4th term is coupled to slow transverse motion, (see definition of ζ), and thus smaller than the 3rd term. These terms are from the action of the fast map F ðzÞ. Note that the drift vanishes _ θ ¼ 0 on resonance, which means that the inclusion of it encompasses small off-resonant effects.
The interaction envelope is where we used which can be understood by looking Eq. (16). In order to evaluate the integration in Eq. (8), we need the field envelope model as a function of z. Assuming small field envelope variation over a wiggler period, we model z dependence of Eq. (23) by is an averaged field envelope and ∂ z K h represent the first order field variation.
Integrating Eq. (25), the generator of the field potential map can be written by where C ≡ cosðk u zÞ, S ≡ sinðk u zÞ, and the integration parameter R h f for an arbitrary function fðzÞ is defined by Explicitly, to the 1st order of _ θ, ζ, and with γ R being the resonant energy, the integration parameters read [17], where J hξ R i is the Bessel function of the first kind of order i and argument hξ R . Only the integer order of Bessel functions are allowed, otherwise, it is understood to be zero. The first term of R h C corresponds to the JJ factor of FEL terminology. Note that the integration parameter R h SC coupled with the transverse field gradient vanishes for odd harmonics. For comparison, we write down generator for the conventional WPA, where the summation and real operator is dropped for simplicity. For even harmonics it becomes, However, plugging Eq. (31) into the generalized WPA Eq. (26), for the even harmonics, it becomes Note that the transverse gradient term in Eq. (34) can be as large as the leading order term that is the first term of Eqs. (34) or (33) when The term proportional to the transverse gradient of the field envelope was already reported in Ref. [18]. Here, we are readdressing its importance on even harmonics.

G. Effective Hamiltonian
The factorized map Eq. (6) with Eqs. (19), (26) are not yet practically useful for numerical implementation because each factorized map is not solvable and the step size is fixed by one wiggler period. A trick is to concatenate the map using the Baker-Campbel-Hausdorff (BCH) formula and define an effective Hamiltonian [16,19], Since, ∂ η G S ¼ −λ u _ θ, the concatenation term ∶G S ∶G V cancel out the _ θ term in the parenthesis of the 1st term in Eqs. (28) and (29). Therefore, it is convenient to definē and so that, the effective Hamiltonian reads, Note that the effective Hamiltonian does not have explicit dependency on z. Therefore, the integration step size need not necessary resolve the wiggler period as long as the field envelope variation is slow. Now, we can apply general numerical methods like Runge-Kutta (RK) with arbitrary step size to solve the effective Hamiltonian. For example, a popular FEL simulation code GENESIS uses the 4th order RK method [11]. Figure 1 shows an order of magnitude improvement in accuracy of particle pusher compared to the conventional WPA. We used the GENESIS pusher to represent conventional WPA. Since, we used converging small enough step sizes, the error from the exact solution originate from the differences of the effective equations of motion in use. However, such an accuracy improvement on particle pusher can easily become obscured by other sources of error such as external field error, standard approximation on the radiation field solver, and numerical discretization.

H. Source term
Maxwell's equation of the vector potential is where e and m are the electron charge and mass respectively, Z 0 is the vacuum impedance, and is the x-directional current density contributed by jth particle at a given time t. Here, q j is the charge weight, v j x is the x-directional velocity. Under SVEA, Eq. (39) becomes In order to separate the harmonic modes, we multiply e −ihðθ−k u zÞ on both sides and integrate over a wiggler period. By doing so, we are also implying WPA on field equation. where is the source term contributed by the jth particle. Before, we evaluate the integration of Eq. (42), we need to write the current as a function of z and transform the temporal coordinate from t to θ. Ignoring the particle-radiation interaction, the particle dynamics is determined by external magnet field, e.g., x j ðtÞ → x j ðzÞ ¼ SðzÞF ðzÞx j and similarly for y j and θ j . Also, by definition of θ, we can replace δðz − z j ðtÞÞ in terms of θ. Therefore, However, the dependence on z inside of the delta functions make the integration of Eq. (42) still formidable. One possible trial is to deposit the source on numerical mesh points, i.e., where φ i is a weight function localized at the ith mesh point and Ω is the ðx; y; θÞ simulation domain. Note that, after integration over Ω with Eq. (43), the weight function will depend on z through x j ðzÞ, y j ðzÞ, and θ j ðzÞ. Thus, integration over z can still be cumbersome especially for localized nonsmooth weight functions. Reference [20] provides an elegant trick to this problem. The idea is to replace the source by a multipole expansion model S j h which we parametrize by where W h is the monopole strength, W h x and W h θ are the dipole strengths, ðx j ;ȳ j ;θ j Þ is the monopole location, and x W , θ W are the dipole separation length. It is natural to chose the monopole location by the average position over the integration step. For example,x This way, the wiggling motion average out which means that the monopole location can be naturally obtained by particle pusher under WPA. For example, in principle, the Exponentially growing Gaussian field envelope from 0.4 MW to 1 GW in power, σ r ¼ 56 μm and λ r ¼ 27 nm is assumed. Electron beam parameters used are σ x;y ¼ 56 μm, ϵ x;y ¼ 0.6 μm, γ ¼ 1000 and Δγ=γ ¼ 2 × 10 −4 . wiggler parameters used are K ¼ 1.5, λ u ¼ 2.5 cm, and . The thick line represents the ensemble average and the shadowed area corresponds to the range of error of simulated particles' population. particle pusher always gives particle location at the middle of the integration step when leap-frog scheme is used. It is also natural to choose the dipole separation length by the amplitude of the wiggling motion.
Once the multipole locations are fixed, the following weak formulation can be used to calculate the multipole strength.
where ρ j ≡ ρðx j ;ȳ j ;θ j Þ. Plugging Eqs. (45) and (50) into Eq. (49), the right-hand side becomes, where W j ≡ Wðx j ;ȳ j ;θ j Þ and used Comparing Eq. (51) with the left hand side of Eq. (49), the multipole strength reads, Explicitly, where the superscript Ã on the integration parameters is the complex conjugate operator and used v j x ¼ c γ j ðp x;j þ K eff;j cos ðk u zÞÞ. For the dipole strengths, we take the leading order term K eff;j only from v j x .
Recall that R h SC vanishes at odd harmonics. Therefore, the horizontal dipole strength W h x;j is important only for even harmonics. As for the temporal dipole, since the dipole separation length θ W is smaller than the wave length, the effect can be negligible when the temporal mesh size is multiple of the wave length. Note that the field Eq. (41) and the multipole source model Eq. (45) with Eqs. (53), (54) does not depend on the wiggler period explicitly. Also, recall that the wiggling motion is averaged out for the choice of the monopole location in the multipole source model. This means that the field solver based on Eqs. (41), (45) together with the particle pusher based on the effective Hamiltonian Eq. (38) can be numerically integrated with arbitrary step sizes.
In practice, it can be efficient to choose the temporal coordinate of the monopole location at the beginning instead of the middle of the integration step when the particle pusher is split into transverse and longitudinal map. As the averaged transverse and longitudinal particle dynamics are weekly coupled in wiggler, we build the particle pusher by transverse and longitudinal map alternatively in leap-frog method. This gives us the transverse and temporal coordinate at the middle and the beginning of the integration step respectively. Therefore, we choose the monopole location at the middle of the integration step for the transverse coordinates and at the beginning of the integration step for the temporal coordinate.

III. NUMERICAL SHOT NOISE MODELING
In addition to the efficient particle pusher and field solver, an efficient shot-noise modeling algorithm is required for a fast FEL simulation. This task can be especially difficult when compatibility with the numerical discretization scheme is considered under WPA framework [12]. In this section, we present a possible solution to this problem.

A. Review of 1D model
We start by reviewing two widely used 1D shot noise modeling methods.
One method is based on the temporal coordinate perturbation of particles by Fawley [13]. The temporal coordinate perturbation for fundamental harmonic mode was first introduced by Penman and McNeil [21]. This idea was extended to higher harmonics by Fawley [13] and by doing so, the perturbation takes a quiet different form from the original model by Penman and McNeil [21]. That is to say, the perturbation of Penman and McNeil method is a uniform random number independent for each particles while the perturbation of Fawley method is harmonic sum of Gaussian random numbers shared among a set of particles, see Eq. (57).
The other method is based on the charge weight perturbation of particles by McNeil, Poole, and Robb [22]. In fact, the 1D model in their paper [22] is based on the combination of the charge weight and temporal coordinate perturbation. However, the temporal coordinate perturbation of Ref. [22] is based on the statistical property of the particle arrival time in a given temporal segment, independent for each particles, and relatively insignificant when it comes to RMS bunching factor, see Eq. (61). Therefore, from now on, when we say "temporal coordinate perturbation", we are referring to Fawley's method, and when we say "charge weight perturbation", we are referring to the 1D method of McNeil et al. without their temporal coordinate perturbation.
These two methods are illustrated in Fig. 2. schematically. The first step is to populate particles uniformly along the temporal coordinate with equal charge weight to remove artificial temporal shot-noise. The bunching factor at this step is Here, index 0 denotes vanishing bunching factor, N e is the number of electrons, M is the number of the simulated particles and m j ¼ N e =M is the electron number weight of the jth particle. It vanishes as all the weights are equal, and the temporal coordinates θ j ¼ θ 0 þ jΔθ are uniformly distributed with equal distance Δθ ¼ 2π=M. If the longitudinal beam profile varies within a wavelength, one can weight the charge based on the profile. This enables coherent spontaneous emission, see Sec. VI. The second step is to model physical shot-noise by adding proper perturbation on the temporal coordinates or charge weights. The root-mean-square (RMS) bunching factor of the physical shot-noise is hb h b Ã h i ¼ 1=N e . The numerical model has to satisfy this condition at least. Following Ref. [13], the perturbation on temporal coordinate of the jth particle is given by where the number of particles is assumed to be twice of the maximum harmonic number to be modeled and ξ h is a random variable. Note that these random variables are shared among 1D particles. Therefore, particles of 1D model with temporal perturbation are correlated. Then, bunching factor changes to Therefore, where the condition hξ h ξ Ã h i ¼ 1=ðh 2 N e Þ is imposed on the random variable ξ h . On the other hand, the RMS bunching factor of the charge weight perturbation [22] is wherem j ≡ m j þ δm j is the perturbed number weight taken from a random variable whose mean and variance is From now on, we will call a set of particles composing 1D model by "beamlet" following conventional terminology [13].

B. Review of 6D extension
Here, we review two widely used methods of 6D extension of the 1D shot-noise model. One is the 5D mirroring [13] and the other is the 6D volume division [22] method. Figure 3 illustrates these two methods schematically.
For 6D phase-space volume division method, the arrival of electrons into the volume segment can naturally be assumed to follow the Poisson process. Particle density of each volume is represented by a single particle sitting at the center of the corresponding volume segment whose charge is sampled from Poisson random number of mean weighted by the density profile. Note that this method is physically intuitive. All the particles are statistically independent. Thus there is no compatibility issue with the numerical discretization. However, it requires a lot of particles. For example, considering 10 divisions in each dimension, it requires 10 6 particles.
The 5D mirroring method is to copy 5D phase-space of a particle to the particles of corresponding beamlet. In other words, each beamlet consists of a number of particles sharing same 5D coordinates x; y; p x ; p y; ; γ initially. The temporal coordinates of the particles in a beamlet follow Eq. (56), localized within one wavelength. Then, the physical shot-noise can be modeled either by adding temporal coordinate perturbation, Eq. (57) or charge weight perturbation, Eq. (62). Note that the member particles of a beamlet are not statistically independent of each other because they are sharing the same 5D phase-space coordinates. In the 5D mirroring method, particle migration across numerical mesh can break the bond of the member particles in a beamlet and thus can produce artificial shotnoise. For example, when one particle in a beamlet migrate, the cancellation of bunching factor in Eq. (56) does not hold.

C. Particle loading and migration
Although the 6D volume division together with the 1D charge perturbation method is free of compatibility issue with numerical discretization schemes, it requires a lot of particles. Since we are concerned with fast simulation, we adopt the 5D mirroring strategy. Our idea is to interpret one beamlet as one statistically independent entity whose phase-space coordinate is given by the average over the member particles in it. This is based on the observation that the member particles are not statistically independent of each other and the movement of the beamlets describe the macroscopic (≳λ r ) dynamics while the movements of the individual member particles of the beamlet describe microscopic (≲λ r ) dynamics [14]. This way, as a usual particle loading algorithm, a beamlet can be regarded as an instance of a random variable whose property follows that of the density probability.
This interpretation allows us to load particles naturally. First, the beamlet is loaded from a random generator or from external upstream tracking code. Then, each beamlet is divided into M ¼ 2h max particles whose temporal coordinates are uniformly distributed in one wavelength following Eq. (56) while the average coordinate is the value of the corresponding beamlet. Here h max is the maximum harmonic number to be included in the simulation. This procedure is described in Fig. 4. Second, we assign mean charge weights to particles based on the temporal density profile then add the temporal coordinate or charge weight perturbation to model physical shot-noise. This procedure was described in Fig. 2. Since the particles in a beamlet are not independent of each other, we migrate all the particles composing a beamlet when the beamlet migrates across the numerical mesh. This migration scheme allows us to use arbitrary weight and shape functions for source deposition and field interpolation respectively. Furthermore, it allows the use of temporal mesh size smaller than the radiation wavelength. This is because the weight and shape functions are evaluated at the beamlet position regardless of individual member particle's relative coordinates. In other words, the same value of evaluated weight and shape is shared among the particles of the corresponding beamlet. Figure 5 presents a self-amplified spontaneous emission (SASE) simulation benchmark between the beamlet migration and GENESIS which does not allow particle migration across temporal mesh. For comparison of migration scheme, we used same macroparticle and mesh data, and same particle pusher and field solver to the leading order. Therefore, the good agreement between the two codes indicates that the numerical beamlet migration is negligible in the parameter settings we used. It also indicates that the beamlet migration suppresses numerical shot noise due to migration oppose to Fig. 6. which shows significant artificial shotnoise due to individual particle migration dominating initial emission. It also shows that as the number of macroparticles increases the artificial shot noise decreases.
The migration scheme enables natural slippage along nonresonant transport lines like drift, quadrupole, dipole, etc. The slippage resolution can also be chosen arbitrary regardless of the integration step size or the mesh size as one can use moving window. This will be further illustrated in detail in the next section.

IV. SLIPPAGE RESOLUTION
The particle migration enables us to use arbitrary resolution of slippage. The typical implementation of slippage is to copy the field data from the previous temporal mesh point to the next temporal mesh point. This procedure can be best understood by a pseudocode in Fig. 7. Our implementation is the moving window which is also illustrated in Fig. 7. For convenience, we will call "CD" for copying data from the previous mesh point and "MW" for the moving window. Note that the slippage resolution of MW is arbitrary while the slippage resolution of CD is one FIG. 5. Bench mark between GENESIS v.1.3 and beamlet migration of SASE simulation using Next Generation Light Source (NGLS) parameters [23]. Blue is the beamlet migration. Dashed orange is GENESIS v.1.3.
FIG. 6. Illustration of artificial shot-noise due to particle migration. Same parameters used as in Fig. 5. Blue is from the beamlet migration. Dashed orange is from the individual particle migration. Number of macroparticles are denoted in text boxes.
FIG. 7. Pseudocode illustrating slippage implementation of the copying data and the moving window. The first two indices of the field data represented by Fld.data are for the transverse mesh points while the last index is for the temporal mesh point. Here, nt is the number of temporal mesh points. The domain range represented by Fld.domain is used by deposition and interpolation algorithm. Therefore, change of the domain range by dtheta effectively slip the field by dtheta. temporal mesh size. Figure 8 shows kinks on power gain curve due to rough slippage resolution for CD and smooth curve for MW.

V. SPLIT AND COMPOSITION METHOD IN FIELD SOLVER
The field equation (41) can be split into two equations, Let F ⊥ and F k represent the operator solving Eq. (63) and (64), respectively. Then, the following composition method F of step size Δz, is a second order method for field equation integration provided that F ⊥ and F k are one-step method of order higher than two [15]. Following GENESIS [11], we adopt the alternating-direction implicit (ADI) method to build F ⊥ . And the moving window implementation of F k is an exact method and the computational load is not significant as can be inferred from Fig. 7. For comparison, we define three integration orderings as shown in Table II. Here, we write "leap-frog-gen" to denote the integration order of GENESIS v.1.3 code [11]. The definition of "leap-frog-SE" method is based on the last paragraph of subsection II H. The "split" method replaces the field solver of leap-frog-SE by Eq. (65). Figure 9 shows comparison of the three integration orderings. We took a simulated power gain curve with small enough step size as a reference and compared how each method varies from the reference when a large step size is used. Note that the split method is the best converging.

VI. COHERENT SPONTANEOUS EMISSION
Consider an artificially truncated uniform electron beam in temporal domain. A strong CSE is expected from such a sharp edge [24][25][26]. It is pointed out in the end of Sec. 2 of Ref. [25] that CSE cannot be modeled under WPA as the charge weight of every particle is regarded uniform over the radiation wave length. However, since we are weighting the macroparticle's charge individually by the longitudinal beam profile, we can include the CSE effect. Figure 10 shows an example of the initial strong CSE within resonant band at the ends of a uniform beam profile. As the beam propagates through the wiggler, the resonant power gain became stronger than the CSE. In this illustration of CSE, we used an sharp ends beam profile in order to exaggerate this effect. Although the sharp edge violates the standard approximation, for a more physical beam profile, such violation by CSE can be moderate. For example, Ref. [10] showed validity of standard approximation in cases when   9. Comparison of the convergence between three different integration orderings. P 0 is the reference power curve, ΔP is the difference between the power curve simulated with a large step size (Δz ¼ 20λ u ) and the reference curve. Same parameters are used as Fig. 6. The large deviation at the initial stage is due to shot-noise. the fast temporal profile variation is present. This capability can be useful for a short bunch or temporal density modulated beam.

VII. CONCLUSION
Several advances in numerical methods for FEL simulation under the WPA are presented.
First, we generalized the WPA using the perturbation Lie map method. The perturbative correctional terms to WPA are identified to be negligible for odd harmonics in typical parameter region. However, when the ratio between wiggling frequency and slow motion frequencies becomes more comparable or when there is large energy spread, the correctional terms may become important. As for the even harmonics, the terms proportional to the transverse gradient of the field envelope can be a leading order correction to the conventional WPA.
Second, we improved the shot-noise modeling method to allow seamless start-to-end simulation and include the CSE effect. The idea is based on the interpretation of the beamlets as statistically independent entities describing macroscopic (≳λ r ) dynamics. This allows us to combine advantages of two different shot-noise modeling methods in Refs. [13,22]. These are the 5D mirroring from Ref. [13] to reduce number of simulated particles and charge weighing from Ref. [22] to include CSE. Such an interpretation also leads us to develop a particle migration scheme compatible with the particle loading method. This enables us to model the temporal beam profile evolution and to correctly simulate nonresonant beam transport through elements other than wigglers. Furthermore, the use of arbitrary weight and shape functions, mesh size, slippage length and integration step size becomes possible.
Finally, we presented and illustrated the advantage of moving window implementation, split-composition methods for field equation and charge weight for CSE. These implementations are possible due to the development of the particle loading and migration method.
All these methods presented in this paper are implemented in, the parallel beam dynamics simulation framework IMPACT code suite [27].