Microbunching Instability in a Chicane: Two-Dimensional Mean Field Treatment

We study the microbunching instability in a bunch compressor by a parallel code with some improved numerical algorithms. The two-dimensional charge/current distribution is represented by a Fourier series, with coe(cid:14)cients determined through Monte Carlo sampling over an ensemble of tracked points. This gives a globally smooth distribution with low noise. The (cid:12)eld equations are solved accurately in the lab frame using retarded potentials and a novel choice of integration variables that eliminates singularities. We apply the scheme with parameters for the (cid:12)rst bunch compressor system of FERMI@Elettra, with emphasis on the ampli(cid:12)cation of a perturbation at a particular wavelength. Gain curves agree with those of the linearized Vlasov model at long wavelengths, but show some deviation at the smallest wavelengths treated. beam frame. Here we discuss the beam frame coordinates and the transformation of the densities between the two frames.


I. INTRODUCTION
Bunch compressors, designed to increase the peak current, can lead to a microbunching instability with detrimental effects on the beam quality. This is a major concern for free electron lasers (FELs) where very bright electron beams are required, i.e. beams with low emittance and energy spread [1]- [11]. In this paper, we apply our self-consistent, parallel solver to study the microbunching instability in the first bunch compressor system of FERMI@Elettra. This system was proposed as a benchmark for testing codes at the September 2007 microbunching instability workshop in Trieste [12].
A basic theoretical framework for understanding this instability is the 3D Vlasov-Maxwell system (on 6D phase space). However, the numerical integration of this system is computationally too intensive at the moment. Our basic model is a 2D Vlasov-Maxwell system.
More precisely, we treat the beam evolution through a bunch compressor using our Monte Carlo mean field Lorentz-Maxwell approximation. We generate N points from an initial phase space density using a pseudo random number generator. Here we use symbol N for the simulated points to be distinguished from N for the number of particles in the beam.
We then calculate the charge density using a smooth density estimation based on Fourier series. The electric and magnetic fields are calculated from the smooth charge/current density using a novel field formula that avoids singularities by using the retarded time as a variable of integration. The points are then moved forward in small time steps using the Lorentz equations of motion in the beam frame with the fields frozen in time. We try to choose N large enough so that the charge density is a good approximation to the density that would be obtained from solving the 2D Vlasov-Maxwell system exactly. We call this method the Monte Carlo Particle (MCP) method. We believe that for N sufficiently large one could obtain an accurate approximation to the Vlasov phase space density. That is beyond our current computer capability, however, and it is likely that a better approach would be to use the method of local characteristics to integrate the Vlasov equation directly.
Our MCP solver has been successfully tested against other codes on the Zeuthen benchmark bunch compressors. Our results for the mean energy loss are in good agreement with 2D and 3D codes confirming that 1D codes underestimate the effect of coherent synchrotron radiation (CSR) on the mean energy loss by a factor of 2. For more details see [13], [14] and references therein.
The system we study consists of a 4-dipole chicane between rf cavities and quadrupoles; see Fig. (4). In this paper we limit our study to the chicane. A complete study is on our agenda. The phase space density on entrance to the chicane is a smooth function a 0 (z, p z , x, p x ) modulated by a factor 1 + A cos(2πz/λ) where A is a small amplitude and λ is the perturbation wavelength. The function a 0 contains the energy chirp, the z − p z correlation that is necessary for bunch compression. Our initial density is discussed in detail at the beginning of Section IV.
A standard approach to study the microbunching instability consists in calculating a gain factor for a given initial modulation wavenumber k 0 [15]- [17]. The gain factor is defined as |ρ(k f , s f )/ρ(k 0 , 0) |, whereρ(k, s) = dz exp(−ikz)ρ(z, s) and k f = C(s f )k 0 for a given initial wavelength of λ = 2π/k 0 . Here ρ(z, s) is the longitudinal spatial density and C(s f ) = 1/(1 + hR 56 (s f , 0)) is the compression factor of the chicane, with s f evaluated at the exit of the chicane and h being the chirp factor.
The functionρ(k, s) can be computed in the full nonlinear self-consistent scheme, but can also be approximated in some cases through a solution of the linearized Vlasov equation.
The linearized Vlasov solution can in turn be obtained as the solution of a 2D linear integral equation, provided that the collective force can be described by an impedance or equivalent wake field. The 2D integral equation reduces to 1D if the initial distribution describes a coasting beam with linear energy chirp. This 1D equation was derived by Heifets, Stupakov, and Krinsky [17] and also by Huang and Kim [15]. Determining an approximate solution of the equation by iteration, Huang and Kim found an explicit formula for the gain.
Since we and others have been puzzled by certain points in the derivation of the 1D integral equation, we rederive the equation in a systematic manner, starting with minimal assumptions and finding first the 2D equation in (k, s)-space for the Fourier transform of the longitudinal density,ρ(k, s). It is remarkable that these integral equations involve just the longitudinal density as unknown, all transverse effects being accounted for in the form of the kernel.
We compare the gain from our full nonlinear MCP computation with the linear gain formula of [15]. Agreement is good at long wavelengths, in spite of the fact that our collective force is computed in a more careful way than that of [15], the latter being derived from the impedance for steady state CSR without account of finite magnet length. At short wavelengths, as little as 80µm in calculations to date, there are deviations from the linear gain. The source of discrepancy has not yet been analyzed. It might be due to nonlinearity, or to the different models of the collective force, or both.
To define clearly our Vlasov-Maxwell starting point we begin with exact equations, but for practical work we later make approximations based on the following assumptions: (A) The maximum bunch size ∆ is small compared to the minimum bending radius.
(B) In beam frame coordinates the bunch form (and also the form of the phase space distribution) changes very little during a time ∆/c. Correspondingly, the field of the bunch at a co-moving point changes little on such a time interval.
Here ∆ is the biggest extent of the bunch in any direction. Under typical conditions (A) and (B) should be very well satisfied. We also assume that the beam is relativistic (γ 1), as is true in the example studied, but that assumption could be removed without great cost.
The paper is organized as follows. In Sections IIA and IIB we give an outline of the Vlasov-Maxwell system, derive our formula for the fields in terms of sources, and discuss the transformation of coordinates from laboratory frame to beam frame, including the transformation of densities. In Section IIC we give the details of our MCP algorithm. In Section III we give the derivation of the linear integral equation and the gain formula. In Section IV we show numerical results for the first bunch compressor system of FERMI@Elettra. We compare the gain factor with the formula from [15] and perform a detailed analysis of 2D charge densities and electromagnetic fields.

II. VLASOV-MAXWELL SYSTEM
Our basic starting point is the Vlasov-Maxwell system in 3D, i.e., we assume collisions can be ignored and that the N −particle bunch can be approximated by a continuum. Our final scheme for computation is less ambitious, but we think that it might be a reasonable approximation to the full system. We reduce the problem from 3D to 2D, since we expect that most of the acceleration by self-fields will be in the plane of the unperturbed orbit. We use a particle method that follows the charge density rather than the phase space density, but hope that with sufficient attention to smoothing the result approximates that defined by the Vlasov-Maxwell system. Our coordinate system, (Z, X, Y ), is shown in Fig. 1. We assume an external force due to a magnetic field, B ext (Z), in the Y −direction. We define a reference orbit, R r (s) = (Z r (s), X r (s)) T , lying in the Y = 0 plane, where R r (β r u), as a function of u, is a solution of the Lorentz equations for E = 0, B = B ext (Z)e Y , and u = ct. Here R = (Z, X) T and s is arc length along the reference orbit. The vector (Z, X) denotes a row vector while its transpose (Z, X) T is a column vector. This may seem pedantic but this distinction is especially important in Section III. In Fig. 1 we sketch R r (s) for a 4-dipole magnetic chicane bunch compressor. We focus on the evolution of F = (E Z , E X , B Y ) T and take (E Y , B Z , B X ) = 0. The latter entails planar motion in the Y = const planes. We model shielding by the vacuum chamber by taking F = 0 at Y = ±g, where h = 2g is the height of the vacuum chamber as shown in Fig.1. We let H(Y ) be the fixed Y density defined for |Y | < g, then the coupled Vlasov-Maxwell system for the field vector F (R, Y, u) and the phase space density H(Y )δ(P Y )f L (R, P, u), with the shielding boundary condition, takes the form: Here Z 0 is the free space impedance, Q is the total charge, QH(Y )ρ L (R, u) is the lab frame charge density (with is the current density (which, of course, has no Y component), m is the electron rest mass, q is the electron charge (so that Q = N q where N is the number of particles in the bunch), γ is the Lorentz (1)(2) are completed by specifying S in terms of (4) and where J L = (J L,Z , J L,X ) T . We use c, Z 0 as our basic parameters instead of 0 , µ 0 , where

A. Field Formula
We calculate F produced by ρ L , J L , but averaged over the Y -distribution: The averaging is appropriate, since we regard motion in the Y -direction as less important and do not allow it in computations. To evaluate (8) we begin with the general formula for F , which follows from the retarded Green function for the wave equation (1): We note that if S(R, u) = 0 for u ≤ u 0 , then F (R, Y, u) = 0 for u ≤ u 0 where u 0 is chosen, in our MCP solver, such that the bunch enters the chicane after the time u = u 0 (for more details, see Section IIC). Here ξ(Y ) is the effective vertical charge distribution needed to impose boundary conditions (F(Y = ±g) = 0) at the parallel plates by the method of . We of course assume that the support of H(Y ) is well within the interval (−g, g) and we also assumed in (9) that H is even. The field for free space comes from the term with k = 0. To average the field as in (8) where We assume that σ is sufficiently small to justify replacing in (11) the Gaussians in η by δ(η − kh). We then have just a 2D integral, which will be the basis for our numerical work: Note that if H(Y ) = δ(Y ) then, for Y = 0, (9) becomes (12). In other words, if H(Y ) = δ(Y ), then the averaging procedure gives the exact field at Y = 0. With certain reasonable approximations it seems possible to retain a non-zero vertical spread while maintaining a reduction to a 2D integration. The accuracy of such a reduction is still under investigation.
The integration in (12) is restricted to a very small part of R 2 , because of the small size of the bunch, but it is awkward to locate this region owing to the fact that spatial and temporal arguments of the source both depend on R . The task of integration is greatly simplified if we take the temporal argument to be a new variable of integration. We first transform to polar coordinates (ζ, θ), then take the temporal argument v in place of the radial coordinate ζ. That is, This conveniently gets rid of the potentially small divisor in (12), giving the field simply as an integral over the source: whereR(θ, v) = R + (u − v) 2 − (kh) 2 1/2 e(θ) .
To estimate the effective region of the θ integration in (14), note that the source in (14) has significant values only forR(θ, v) restricted to a bunch-sized neighborhood of R r (β r v); i.e., the bunch is close to the reference particle. For the field F at time u we are interested only in R in a bunch-sized neighborhood of R r (β r u). Thus for R in a small neighborhood of R r (β r u) the integrand is appreciable only when where ∆ was introduced in Section I. For k = 0 and u − v large compared to ∆, this cannot be satisfied unless e(θ) has nearly the same direction as R r (β r u) − R r (β r v), which is to say that the domain of θ integration is tiny (and close to θ = 0 for a chicane with small bending angle). When u − v gets close to ∆ the domain expands precipitously to the full [−π, π].  The beam frame is defined in terms of the reference orbit R r (s) = (Z r (s), X r (s)) T which in turn is defined by the Lorentz equations without self fields. The unit tangent vector, t, to the reference orbit is just t(s) = R r (s) and we define the unit normal vector, n, by n(s) = (−X r (s), Z r (s)) T so that n is a π/2 counterclockwise rotation from t as shown in Fig. 3. It follows from the Lorentz equations that t (s) = −qB ext (Z r (s))n(s)/P r where P r = mγ r β r c is the momentum of the reference particle. We define the curvature, κ, by κ(s) = qB ext (Z r (s))/P r and thus t (s) = −κ(s)n(s) and n (s) = κ(s)t(s). In terms of Fig.   1 this makes κ negative in the first magnet, positive in the second magnet and so on.
The beam frame Frenet-Serret coordinates are s, x, where s is the arc length along the reference orbit and x is the perpendicular distance along n. Thus the transformation from In addition, we define p s and p x by P = P r (p s t(s) + p x n(s)).
Our lab to beam transformation has three steps: The first step is the transformation just discussed. In the second step the variables s and u are interchanged making s the new independent variable. In the final step z = s − β r u replaces u as a dependent variable and p z = (γ − γ r )/γ r replaces p s . Thus the variables z, p z , x, p x are small near the reference orbit which corresponds to z = x = 0. Eq. (16) defines s = s(R) and x = x(R) in a neighborhood of the reference orbit so that z = z(R, u) = s(R)−β r u and we have the identity R ≡ R r (z(R, u)+β r u)+x(R)n(z(R, u)+β r u). Since z is small for R in the bunch, expanding for small z gives R = R r (β r u)+ M (β r u)r+ O(κz 2 , κxz) and we obtain the approximate inverse We make extensive use of formula (18) within its domain of validity, namely when R is in a neighborhood of R r (β r u) comparable in extent to the bunch size, and the bunch size is small compared to the bending radius.
The equations of motion in (z, p z , x, p x ; s) have the fields F (R, u) evaluated at R = R r (s) + xn(s) and u = (s − z)/β r . We have the following approximations: At the first approximation we use the fact that the fields are slowly varying in s for fixed r (see Assumption B of Section I) and that β r ≈ 1. The second approximation uses the fact that we are only interested in the fields in the bunch for r small. From (5) we obtain whereR = R r (s) + M (s)r and = d/ds. The self-forces are given approximately by where E Z , E X , B Y are evaluated at (R, s). We have expanded F x in order to point out that each of the last two terms are large whereas their difference is small. Details are presented in [18].
The equations of motion (20), without the self fields, represent the Lorentz equations in linearized form, for the relativistic case γ r 1. Including the self fields we write (20) as where ζ = (z, p z , x, p x ) T . The linear part ζ = A(s)ζ can be solved and the solution written which is defined in terms of the dispersion function, D(s, τ ), and R 56 (s, τ ) from Section III.
The equations of motion in the interaction picture become We have found that it is numerically more efficient to integrate (23) than to integrate (20).
Our field formula is in the lab frame so the lab charge and current densities must be determined from the beam frame phase space density. The relation between the lab phase space density, f L , and the beam phase space density, f , is This leads to where τ (r, s) = R 2 p x f (z, p z , x, p x , s)dp z dp z . Using the fact that f (z, p z , x, p x , ·) is slowly varying and ρ(r, s) has its support for r small, we have ρ(z(R, u), x(R, u), z(R, u) + β r u) ≈ ρ(r, β r u), wherer = M T (β r u)(R − R r (β r u)). Thus where the J L approximation is derived similarly to that for ρ L .
There are subtleties in the second transformation caused by interchanging the roles of u and s as independent and dependent variables. The phase space density transformation (24) and the approximations are discussed in detail in [18].

C. A method of solution: Monte Carlo particle method
We have discussed our method for calculating the fields in the lab frame and the determination of the lab frame charge and current densities from the beam frame phase space density. Here we discuss a method of solution of the coupled Vlasov-Maxwell system similar to traditional particle methods, variously called particle-in-cell (PIC) or macro-particle methods. We call it the Monte Carlo particle (MCP) method, because it uses a Monte Carlo method to determine a smooth charge distribution from an ensemble of particles.
Before we developed the MCP method we considered solving the Vlasov equation using the method of local characteristics ( or "semi-Lagrangian method"), which has been extremely effective in problems with a 2D phase space. This deals with the Vlasov equation in a very direct way, defining the phase space density by its values on a grid with interpolation to off-grid points. The density is updated by integrating backward from grid points, with the collective force regarded as constant during a time step. Since the backward orbits land at off-grid points, this update requires interpolation. In comparison with usual particle methods, this method offers much lower noise and the possibility of a relatively direct control of accuracy by monitoring interpolation error. On the other hand, it is relatively expensive in computation time and memory, and in the case of bunch compressors it is technically complicated because the density is concentrated in a narrow region of phase space that evolves in time in a manner that is not known a priori [19]. We are studying ways to deal with this evolving support, since it would be inefficient to use many grid points where the density is negligibly small. Possible techniques include changes of variable [2], an evolving selection of fiducial grid points, and the use of forward characteristics rather than backward [20]. Although we have high hopes for success in this direction, the present report has the more modest goal of improving the particle method, in which it is much easier to deal with the support question since one has to work only with the charge density in 2D rather than the phase space density in 4D. In particle methods the connection to the Vlasov equation is unfortunately indirect, and the control of accuracy relies entirely on the experiment of increasing the number of particles. Even if one believes that a solution of the Vlasov equation is obtained in the limit, it is usually too expensive to make a convincing empirical demonstration of convergence.
In the Monte Carlo particle method we represent the charge/current density in the beam frame as a truncated Fourier series, thus giving ourselves a density that is smooth, of class C ∞ . The integrals representing the Fourier coefficients are evaluated by Monte Carlo sampling of the integrand, over the ensemble of particles. Ideally one would use the resulting Fourier series and its gradient to evaluate the source in the field formula. That is too expensive, however, since it involves multiple summations of the Fourier series, at points not amenable to the fast Fourier transform. Instead, we use the Fourier series to put the density and its gradient on a grid, and then use low order polynomial interpolation for evaluations at off-grid points. Thus we accomplish something similar to charge deposition in particlein-cell codes, but by a different route, and get the gradient as well as the density itself at grid points. Our method gives low noise, but is costly at high levels of resolution. We have not yet carried out a careful comparison with more usual methods at similar levels of cost and resolution.
We now describe the algorithm more concretely, for a small step s → s + ∆s of the evolution variable in the beam frame. To set up a grid we first do a rotation in the beam frame (z, x)-plane to put the axes along the principal axes that an unperturbed Gaussian distribution would have at the current value of s. The distribution in our self-consistent calculation is of course not exactly Gaussian, but the principal axis transformation is nevertheless found to be useful for describing the self-consistent density. The coordinates (z,x) in the rotated frame are then mapped to coordinates (x 1 , x 2 ) that lie in the unit square On and beyond the boundary of the square the density and its gradient are regarded as being zero; in practice this might be at about 6σ for a Gaussian. Densities in the new coordinates are written asρ(x 1 , x 2 , s),τ (x 1 , x 2 , s),f (x 1 , p z , x 2 , p x , s).
To generate the initial positions of N particles we use the rejection method [21], assuming particles are independent identically distributed (IID) according to the initial phase space density.
The algorithm goes as follows: 1. We expandρ(x 1 , x 2 , s) andτ (x 1 , x 2 , s) in a finite Fourier serieŝ where Here Sinceρ is a probability density the Fourier coefficients θ ij may be written as the expected value E of φ i (X 1 )φ j (X 2 ) with respect toρ where X = (X 1 , X 2 ) is the random variable with densityρ. To estimateτ , which is not a probability density, we notice that the Fourier coefficients Θ ij may be written where X = (X 1 , P Z , X 2 , P X ) is the random variable with densityf .
It follows that the natural estimate of E is the sample mean where a realization of the random variable X = (X 1 , P Z , X 2 , P X ) is obtained from beam frame scattered phase space points z i , p zi , x i , p xi at s, i=1,..,N (via the transfor- . This is a density estimation used in statistical estimation, see e.g., [22]. In Section IV we discuss how we determine N and (I, J) for a particular simulation.
2. We recall that to calculate the fields from our field formula in (14) we perform the θ integration with the trapezoidal rule ( which is superconvergent) and the v integration with an adaptive integrator. To this end we need the source term S at arbitrary values of its arguments. We proceed as follows. We first notice that the Fourier method of item 1 not only gives an analytical representation at s ofρ andτ but of ∇ρ and ∇τ as well. A representation of ∂ρ/∂s and ∂τ /∂s is obtained by differentiating the Fourier coefficients with a finite difference scheme. Even though it is possible to construct the source term S by storing the "history" of the Fourier coefficients, i.e. θ ij and Θ ij , dθ ij /ds and dΘ ij /ds on a grid in s, we found it is more efficient to storeρ, ∇ρ and ∂ρ/∂s (the same forτ ) on a 3D grid in (x 1 , x 2 , s). We use a uniform grid in (x 1 , x 2 , s).
To evaluate quantities at off-grid points we use a 3D quadratic interpolation. In our 3. We use item 2 to advance the particles in the interaction picture of (23). This allows us to use an Euler scheme where the integration step ∆s is determined by the strength and smoothness of the self fields. The fields are calculated on a grid in (z,x). To calculate the fields at particle positions on offset grid points we use a 2D quadratic interpolation.
4. The procedure is iterated going back to item 1.
We mentioned that the MCP method can be time consuming. We are attempting to improve the Monte Carlo integrations by trying variance reduction techniques, which build on the central limit theorem [21,23], and also by trying quasi-random sequences (also called low-discrepancy sequences) in place of pseudo-random sequences [23,24]. Quasi-random sequences allow one to break the "curse of dimensionality" in grid-based multi-dimensional integration, giving a true error bound (i.e., not probabilistic) of order (log N ) k−1 /N , with only logarithmic dependence on the dimension k of the space.
As an alternative to MCP we are investigating a scheme based on the standard PIC procedure of charge deposition to a grid, followed by additional filtering using wavelets.

III. LINEAR INTEGRAL EQUATION TO DETERMINE THE GAIN FACTOR
Recalling the definitions of force components in (21), we now consider the case when F z2 and F x are zero and F z1 can be approximated by an impedance model. Our equations of where the collective force term G(ζ, s; f ) in (37) is defined by and where f is the beam frame phase space density in (24). The radiation wake function W and the radiation impedance Z form a Fourier transform pair: The vector field defined by the rhs of (37) is divergence free, thus the Vlasov equation is where D 2 f is the partial derivative of f w.r.t. s and where D 1 f is the row vector consisting of the partial derivatives w.r.t. z, p z , x, p x respectively.
The longitudinal spatial density is defined by and our goal is to characterize its z-Fourier transform, at wave numbers, k, corresponding to the feared microbunching instability. In terms ofρ the collective force can be written as We wish to study linear stability of a "smooth" solution f 0 of the initial value problem That is, we write f = f 0 + f 1 , linearize in f 1 , and then look for growth (in some appropriate sense) of an initial value of f 1 . Here the spatial density from f 1 will contain wavelengths less than those of any appreciable component of f 0 ; that is the meaning of "smooth" as an attribute of f 0 depending on the choice of f 1 . We emphasize that f 0 is smooth relative to f 1 , not necessarily smooth by some absolute standard. Note that the ρ in the present section is different from the ρ in Section IIB where it was the full spatial density.
Of course it is difficult to find an f 0 satisfying (45) for an entirely arbitrary initial value a 0 ; to do so would be the same as solving the full problem, thus obviating any reason to linearize. There may be some a 0 , however, for which the collective force G 2 (z, s; f 0 ) is initially zero and remains so; in that case we can solve (45),(46) by characteristics. The force will be initially zero if the spatial density has no Fourier spectrum within the support of the impedance: Z(k, 0)ρ 0 (k, 0) = 0. This is the case for a uniform spatial density ("coasting beam", ρ 0 (k, 0) ∝ δ(k), Z(0, s) = 0) but could also arise with a bunched beam through shielding of CSR by the vacuum chamber, so that Z is essentially zero at wave lengths longer than the shielding threshold λ 0 . Under unperturbed propagation in the magnetic lattice alone, the coasting beam condition is maintained (if the energy chirp is linear), and the bunched beam under shielding could also keep its spectrum above λ 0 if bunch compression were not too extreme. Thus we have at least two cases in which a solution of (45), (46) is available. Others might be obtained by time domain numerical integration of (45) from a smooth initial condition, provided that very small wavelengths do not appear as the integration proceeds.  [15] and [17] for the specific a 0 used in those references.
Putting f = f 0 + f 1 into (41) and applying (45) we obtain Linearizing (47) in f 1 gives the following initial value problem for f 1 : Eq. (48) is a linear first-order partial differential-integral equation in the independent variables ζ, s for the s evolution of f 1 . The initial condition (49) will contain the perturbation involving wavelengths of interest.
We now examine the evolution equation of f 1 along the characteristics of the unperturbed problem. Thus we define g by g(ζ, s) = f 1 (ϕ(s, 0, ζ), s) , and this gives, by (48) and the definition of ϕ, where the column vector D 1 ϕ(s, 0, ζ) is the derivative of ϕ(s, 0, ζ) with respect to s. The replacement of f 1 by g in (58) is often called "passing to the interaction picture", since the evolution of g is governed just by the collective force, the "interaction".
By differentiating (57) we have and hence by (54) and (55) where Da 0 (ζ) is the row vector consisting of the partial derivatives of a 0 w.r.t. z, p z , x, p x .
From (62) we see that the first factor on the right hand side of (59) can be written where Φ * 2 denotes the second column of Φ.
Thus our 2D integral equation is (67) with the kernel given in (68).
The assumption of linear chirp is usually not realistic, even though one tries to minimize nonlinearity in bunch compressor designs. Venturini [31] has raised the question of whether nonlinear terms in the chirp might affect the microbunching instability, having noticed a discrepancy between simulations with and without nonlinear terms (albeit simulations that differed in other respects as well). Approaching this question through the integral equation, we can choose in place of (76) an initial density of the form That is, we still assume an initial coasting beam but with nonlinear chirp function α(z). For the special case of a cubic chirp, α(z) = z + bz 3 , which may be realistic in some cases, one can evaluateã 0 in terms of the Airy function [32]: The + sign is chosen when κ 1 +hκ 2 and hbκ 2 have the same sign, the − sign when their signs are opposite. Since (95) lacks the delta function of (77), the integral equation is now in 2D.
This may be the most interesting case for a first study of the 2D equation. Generalizing the calculation of (92)ff, the solution might be approximated by iteration. Note that the concept of gain should be generalized in this case, since the compression is no longer determined by C(s).
Note that the initial coasting beam condition of uniform charge density is not maintained when the chirp is nonlinear, even if the collective force is turned off, since α(z) acquires a nonlinear dependence on p z under unperturbed propagation in the lattice. The charge density becomes non-uniform in z through chirp alone, and G(ζ, s; f 0 ) = 0 for s > 0.
Perhaps G(ζ, s; f 0 ) is nevertheless sufficiently small to be neglected when b is small; this must be checked.
Recall that Ai(x) has exponential decrease for x > 0, but oscillates for x < 0, and has discontinuous slope at x = 0. Consequently the behavior of (95) near u = 0 at small v will be complicated and will require close attention in any numerical or analytic study.  Table 1.

IV. FERMI@ELETTRA BUNCH COMPRESSOR STUDIES
We apply the MCP method to study the FERMI@Elettra first bunch compressor system.
This example was proposed as a benchmark for testing codes. The complete layout of the system is shown in Fig. 4. The system consists of a 4-dipole chicane between rf cavities and quadrupoles. Here we limit our study to the chicane with parameters as listed in Table   1. The results are obtained in the free space case; i.e., neglecting shielding effects from the vacuum chamber. The lengths L 1 , L 2 and L b are in terms of the lab frame Z-variable, thus the total length of the chicane is 8m. The total arc length traversed by the reference particle is s f = 8.029m.
In our simulations we noticed that τ has a negligible effect therefore we ignored its contribution. To study the microbunching instability, we choose the initial beam frame phase space density consistent with Section III. Specifically we take where The function ρ c contains the linear chirp and µ is a flattop distribution, even in z, with maximum at z = 0. Thus the smooth a 0 is perturbed by a modulation, ε, with wavelength λ and small amplitude A. In the calculations we take A = .05, a = 0.00118m and b = 0.00015m and vary λ. i.e. µ(a) = µ(0)/2). The slope at a is dµ(a)/dz = 1/4ab and in the limit b → 0 we obtain a uniform distribution in z ∈ [−a, a]. The linear energy chirp h is created by offcrest RF acceleration such that particles in front of the reference particle gain less energy than particles behind the reference particle. This creates the correlation needed for bunch compression. The uncorrelated energy spread σ E = 2KeV gives σ u = σ E /E r = 8.6 × 10 −6 (recall p z = (E − E r )/E r ). We calculate the gain factor |ρ(k f , s f )/ρ(k 0 , 0) | for an initial modulation of wavelength λ = 2π/k 0 ≥ 80µm. Here s f = 8.029m is the path length s along the reference orbit at the exit of the chicane and the compression factor of the chicane is C(s f ) = 1/((1 + hR 56 (s f , 0)) = 3.545.
In Fig. 5 (left) we compare the analytical formula for the gain factor given in [15], Eq. (38), with the gain factor calculated numerically with our solver. The formula from [15] takes into account only CSR effects whereas our Vlasov-Maxwell approach automatically includes the effects of CSR and space charge. The numerical gain factor agrees with the analytical formula at large wavelengths but shows some deviation at short wavelengths.
In Fig. 5 (right) we show the initial charge density for λ = 100µm. The effect of the modulation on the average longitudinal force (mean power) and on the transverse emittance is very small for all values of λ. This is shown for λ = 100µm in Fig. 6. Notice that the transverse emittance at s f is 1.5 times the initial one. The large increase of the emittance in the middle of the chicane is a spurious effect due to dispersion while the final increase is totally due to the self fields. In Fig. 7, 8 and 9 we show the charge density in normalized coordinates (z n , x n ) ∈ [−1, 1] × [−1, 1] at s = 8.029m (end of chicane) for λ = 80µm, 100µm and 200µm respectively. In normalized coordinates the charge density at s = 8.029m is very close to the charge density at s = 0m, therefore the figures show the collective effect due to the self fields alone. An amplification of the initial modulation is clearly visible. Notice that the amplification is mostly at wavelengths close to the initial modulation rescaled by the compression factor. This is also shown in Fig. 10 where we compare |ρ(k, s f ) | with and without self fields for λ = 200µm and λ = 80µm . This makes the gain factor a useful formula since it gives informations about the spectrum of the longitudinal density at the wavelength λ re-scaled by the compression factor. In Fig. 11, 12 and 13 we show the longitudinal force F z1 , proportional to E · t, at s = 8.029m for λ = 80µm, 100µm and 200µm respectively. Notice that the maximum intensity of F z1 increases as λ decreases.
The simulations have been done on the high-performance computer cluster (HPC) at the University of New Mexico and on NERSC at the Lawrence Berkeley National Laboratory.
Our Vlasov-Maxwell solver, based on a Monte Carlo particle method to solve the Vlasov equation and a Green function method to solve the Maxwell equations, demonstrated high efficiency and scalability. Its high performance has been tested on several parallel clusters.
The number of particles N used in the simulations varies from 10 7 to 10 8 and the number of Fourier coefficients (I, J) used in the estimation of the 2D charge/current density runs from (30,30) to (120, 50). For a particular simulation we fix N and (I, J) as follows. We recall that at s = 0 the charge density has the form [1 + A cos(2πz/λ)]µ(z)η(x) where µ(z) is a flattop distribution and η(x) is Gaussian. The value of J is fixed by the spectrum of η(x) while the value of I is determined by λ and by the extent of the grid in z. In our simulations the grid extent is 6σ z = 4.6mm thus for a given modulation λ, for example for λ = 100µm, the value of J must be bigger than 92. To determine N we define an error as the square of the L 2 norm of ρ est − ρ an , where ρ est is the estimated charge density and ρ an is the charge density given analytically. We choose N in order to have ≤ 10 −5 .
The study of the gain factor at short wavelengths is computationally expensive. Moreover, the increased length of the 3D vectors needed to store the history of the charge/current densities leads to intensive memory usage. Studies are in progress to investigate wavelengths shorter than λ = 80µm and different amplitudes A. An important prediction of the gain factor formula from [15] is that increasing the uncorrelated energy spread reduces the gain factor. This led to a proposal, the laser heater, to increase the uncorrelated energy spread within FEL tolerance in order to damp the microbunching instability without degrading the FEL performance. An analysis of this effect together with the complete study of the FERMI@Elettra benchmark bunch compressor system will be discussed in a forthcoming paper.

V. CONCLUSIONS
We have demonstrated a procedure with some new features for self-consistent simulation of coherent motion, with application to a bunch compressor. Although it is based on tracking an ensemble of particles, as in usual macro-particle or PIC codes, the method of smoothing the charge distribution is quite different, using a Fourier expansion with Monte Carlo determination of the expansion coefficients. The resulting smooth distribution is used in an accurate solution of the field equations by a Green function method. We hope that the resulting time evolution of the charge density approximates that which would be obtained from a solution of the Vlasov-Maxwell system on the 4D phase space, but there is no direct check on accuracy of such an approximation. However, the evident lack of noise in the simulation is encouraging. Using 10 7 − 10 8 particles and an adequate number of Fourier modes we are able to study amplification of initial density modulation down to a wavelength of 80µm, in the example of the first chicane bunch compressor at FERMI@Elettra. We see clean amplification at the compressed value of the initial modulation wavelength, in accord with the prediction of the linear theory except at the smallest wavelengths. Even at 80µm the modulation has negligible effect on the final emittance and energy loss, although the gain is large.
We anticipate improvements in the code regarding treatment of the charge density, but at present the most costly part is the field calculation. We intend to review the choice of integration variables and the integration algorithms to see if the field evaluation can be speeded up. Parts of the integration, for large retarded times, may have been done more accurately than necessary.