Planar undulator motion excited by a fixed traveling wave: Quasiperiodic Averaging, normal forms and the FEL Pendulum

We present a mathematical analysis of planar motion of energetic electrons moving through a planar dipole undulator, excited by a fixed planar polarized plane wave Maxwell field in the X-Ray Free Electron Laser (FEL) regime. Our starting point is the 6D Lorentz system, which allows planar motions, and we examine this dynamical system as the wavelength of the traveling wave varies. By scalings and transformations the 6D system is reduced, without approximation, to a 2D system in a form for a rigorous asymptotic analysis using the Method of Averaging (MoA), a long time perturbation theory. The two dependent variables are a scaled energy deviation and a generalization of the so-called ponderomotive phase. As the wavelength varies the system passes through resonant and nonresonant (NR) zones and we develop NR and near-to-resonant (NtoR) normal form approximations. For a special initial condition and on resonance, the NtoR normal form reduces to the well-known FEL pendulum system. We then state and prove NR and NtoR first-order averaging theorems which give near optimal error bounds for the MoA approximations. The proofs are novel in that they do not use a near identity transformation and they use a system of differential inequalities. The NR case is an example of quasiperiodic averaging where the small divisor problem enters in the simplest possible way. To our knowledge the planar problem has not been analyzed with the generality we aspire to here nor has the standard FEL pendulum system been derived with associated error bounds as we do here.


Abstract
We present a mathematical analysis of planar motion of energetic electrons moving through a planar dipole undulator, excited by a fixed planar polarized plane wave Maxwell field in the X-Ray FEL regime. Our starting point is the 6D Lorentz system, which allows planar motions, and we examine this dynamical system as the wave length λ of the traveling wave varies. By scalings and transformations the 6D system is reduced, without approximation, to a 2D system in a form for a rigorous asymptotic analysis using the Method of Averaging (MoA), a long time perturbation theory. The two dependent variables are a scaled energy deviation and a generalization of the socalled ponderomotive phase. As λ varies the system passes through resonant and nonresonant (NR) zones and we develop NR and nearto-resonant (NtoR) MoA normal form approximations. The NtoR normal forms contain a parameter which measures the distance from a resonance. For a special initial condition, for the planar motion and on resonance, the NtoR normal form reduces to the well known FEL pendulum system. We then state and prove NR and NtoR first-order averaging theorems which give explicit error bounds for the normal form approximations. We prove the theorems in great detail, giving the interested reader a tutorial on mathematically rigorous perturbation theory in a context where the proofs are easily understood. The proofs are novel in that they do not use a near identity transformation and they use a system of differential inequalities. The NR case is an example of quasiperiodic averaging where the small divisor problem enters in the simplest possible way. To our knowledge the planar problem has not been analyzed with the generality we aspire to here nor has the standard FEL pendulum system been derived with associated error bounds as we do here. We briefly discuss the low gain theory in light of our NtoR normal form. Our mathematical treatment of the noncollective FEL beam dynamics problem in the framework of dynamical systems theory sets the stage for our mathematical investigation of the collective high gain regime.  1 Introduction We present a normal form analysis of the three-degree-of-freedom Lorentz force system of six ODE's (ordinary differential equations) governing the planar (x, y = 0, z) motion of relativistic electrons moving through a planar dipole undulator along the z-axis perturbed by a traveling wave radiation field along the z direction. We are interested in the parameter range for an X-Ray FEL.
Our normal form analysis is based on the Method of Averaging (MoA) at first order. The method has four steps. The first step is to put the ODE's into a standard form. The second step is to identify the normal form approximations. The third step is the derivation of error bounds relating the exact and normal form solutions. The final step is the transformation back to the original variables of the Lorentz force system. In the first step new variables are typically introduced using scalings and transformations. In this process we discover that the exact problem can be formulated, without approximation, in terms of two ODE's for the normalized energy deviation and a generalized ponderomotive phase. Important in this process is the identification of an appropriate small dimensionless parameter, often denoted by ε, so that the system can be written asu = εf (u, t)+O(ε 2 ). In the present context this is the most complicated step. The normal form approximation is obtained by dropping the O(ε 2 ) term and replacing f by its t−average. The third step is often the most difficult, however here the system in standard form is fairly simple and we use this opportunity to give very detailed proofs of two averaging theorems, partly as a tutorial on the methods of proof, rather than applying general theorems from the literature. The latter allows us to obtain quite explicit error bounds which are likely near optimal.
An electron, as a member of an electron bunch, will enter the undulator with a given angle in the y = 0 plane and a given Lorentz factor. Here the normalized angle will be given by ∆P x0 and the Lorentz factor will be written γ = γ c (1 + η) where γ c is a characteristic value of γ for the electron bunch, e.g. the mean, and η is the so-called normalized energy deviation. We will replace η by χ via the relation η = εχ, where a posteriori ε will be a measure of the spread of η values which lead to an FEL pendulum type behavior. We let B u , k u denote the undulator field strength and wave number and let E r , νk r denote the Maxwell field strength and wave number of the fixed traveling wave radiation field. Thus our basic parameters are eight, namely ∆P x0 , γ c , ε, B u , k u , E r , k r , ν. We will study the electron response to the radiation field as ν = O(1) varies. The choice of the parameter k r will be discussed below.
For an X-Ray FEL, ε is small, γ c is large and the undulator parameter, is O (1). Also k r = O(k u γ 2 c ) and we define the O(1) constant K r by (1.2) In §2.3 we will fix K r (and thus k r ) by setting For those familiar with FEL theory, k r is, for ∆P x0 = 0, the usual so-called resonant wave number (See e.g., [1]). The dependence of K r on ∆P x0 will be a consequence of our analysis. For the LCLS (Linac Coherent Light Source) λ u = 3cm, mc 2 γ c = 15GeV and B u = 1.32T so that K = 3.70 (see http://www-ssrl.slac.stanford.edu/lcls/lcls_parms.html).
Mathematically then, we are interested in an asymptotic analysis of the electron motion for ε small and γ c large as ν varies. In particular we are interested in the (ε, γ c ) regime that gives rise to the pendulum type behavior important for the functioning of an X-Ray FEL. We find that in order to obtain this behavior, in the MoA at first-order, there must be a relation between ε and γ c . Introducing the normalized field strength we show a pendulum type behavior emerges when ε = O( √ E/γ c ) for γ c ≫ 1. Without loss of generality we will take the order constant to be 1, and choose We also show that, for ε small, the system associated with (1.5) has a resonance structure, such that as ν varies the system goes through a sequence of nonresonant (NR) and near-to-resonant (NtoR) zones. The associated NtoR approximating normal forms are pendulum like and reduce to the standard FEL pendulum system for ∆P x0 = 0 and ν an odd integer. This behavior is not present for ε ≪ 1/γ c or ε ≫ 1/γ c and so we refer to (1.5) as a distinguished case. This turns out to be a very simple example of the concept of a "distinguished limit" in the singular perturbation literature. This can be seen in action in the context of our equations (2.56) and (2.57). In summary, for the distinguished case of (1.5), our basic nondimensional parameters are K, ∆P x0 , E, ε, ν. For ε small we will obtain a sequence of nonresonant (NR) and near-to-resonant (NtoR) normal form approximations as ν varies. The NtoR normal forms can be understood in terms of the simple pendulum system and reduce to the usual FEL pendulum equations for ∆P x0 = 0 and ν an odd integer (See Sections 3.4.2 and 3.4.3). The NtoR normal form allows us to study the effect of ν being slightly off resonance. This completes the first two steps in the MoA. In the third step we prove two theorems which give error bounds, relating the exact and normal form solutions, which go to zero as ε → 0+. Our goal is to present a mathematically rigorous analysis that is self contained.
Standard derivations of the FEL pendulum equations can be found in [2], [3], [4], [5]. They differ from our approach in that they start from the ODE for the normalized energy deviation, η, and use physical reasoning to introduce approximations leading to the FEL pendulum normal form for ∆P x0 = 0. In contrast, our starting point is the three-degree-of-freedom Lorentz force ODE's which are clearly more general and we make no approximation in going to the standard form for the MoA. Thus our only approximation is in going from the averaging standard form to the normal form approximations. Furthermore we obtain error bounds which do not appear to be possible in the standard derivations and these bounds are covered by our averaging theorems. Our definition of resonance is intimately linked to the derivation of our averaging normal forms, whereas in the standard derivations resonance is introduced in the context of maximizing energy exchange. We emphasize that we obtain more than the pendulum normal form; we also obtain the more general NtoR normal form as well as the NR normal forms.
We do not intend to minimize the importance of the standard derivations, the physical derivations are certainly important and as is often the case show great physical insight. Here we want to show what can be done in a mathematically rigorous way in the context of dynamical systems theory, but in that we have been guided by and are indebted to the work of e.g., [2], [3], [4], [5].
For ODE's, the MoA is the most robust of the longtime perturbation theories which include e.g., Lindstedt series [6], multiple scales [6], renormalization group methods [7] and Hamiltonian perturbation theory [8]. For example, Hamiltonian perturbation theory has the advantage that one is transforming a scalar function, however the MoA is more robust in that transformations and scalings are not restricted to canonical transformations. Central to the MoA, and in contrast to those just mentioned, is the derivation of error bounds. We emphasize these are true bounds and not just estimates. The MoA is a mature subject and there are several good books, see [6,9,10] for example as well as the Scholarpedia articles [11,12]. We refer to the MoA approximation as a normal form. Generally, a normal form of a mathematical object is a simplified form of the object obtained with the aid of, for example, scalings and transformations such that the essential features of the object are preserved. Here we not only preserve the essential features of the exact ODE's but bound the errors in the approximation with a bound proportional to the small parameter ε. See [11] for the use of normal form in a similar context. This paper has a pedagogical aspect, giving the reader, who may not be familiar with modern long time perturbation theory, an introduction in a context where the proofs are easily understood. In addition, we hope that both newcomers to the field and mathematical scientists will find this a good introduction to the noncollective case of an FEL. We also hope that experts will find something of interest. The reader does not need to be familiar with averaging theory as we give complete proofs including detailed error bounds. Furthermore we obtain better results as our theorems are tuned to the problem at hand. In addition, to our knowledge, the treatment of the undulator problem in the mathematically rigorous and self-contained way that we do here has not been done before. Our mathematical analysis is not deep, using only undergraduate mathematics as commonly taught in advanced calculus courses, however it is complicated and somewhat intricate in spots. Finally, for us, it sets the stage for our more serious goal of a deep mathematical understanding of the collective high gain FEL theory.
We proceed as follows. In §2 we start with the three-degree-of-freedom Lorentz equations with a general traveling wave field in (2.7)-(2.10) and then introduce z as the independent variable. The system has planar solutions where 0 = y = p y and using a conservation law we arrive at a system of two ODE's (2.33),(2.34) for the energy deviation and a precursor to a generalization of the so-called ponderomotive phase. By scalings and transformations we discover the distinguished case of (1.5) which then leads to a standard form for the method of averaging in (2.62),(2.63). The two dependent variables are now a scaled energy deviation and a generalization of the so-called ponderomotive phase.
In §3 we present our main results. We begin by introducing the monochromatic traveling wave field, the case of main physical interest. The system is carefully defined in §3.1. In §3.2 we define nonresonant, ∆-nonresonant, resonant, and near-to-resonant ν in the MoA context. We emphasize that as ν varies the system passes through resonant and nonresonant zones. The NR case, its first-order averaging normal form and associated solutions are presented in §3.3 along with a proposition giving an appropriate domain for the associated vector field. §3.3 sets the stage for the more interesting NtoR case of §3.4. The NtoR system is carefully defined along with a proposition giving an appropriate domain for the associated vector field. The first-order averaging normal form is derived and solutions written in terms of solutions of the simple pendulum system. It is unlikely that all ν values are covered accurately by our normal forms, however we are able to argue in §3.4.4 that there is a sense in which the NR case emerges from the NtoR case. The third and fourth steps of the MoA are performed in §3.5 and §3.6. In fact, the statements of our first-order averaging theorems, which give an order ε bound on the error for long times, i.e., intervals of O(1/ε), are presented in §3.5 and applied to the phase space variables in §3.6. By taking special initial conditions (∆P x0 = 0) we recover the result of standard approaches which focus on the energy transfer equations alone and do not consider the phase space variables. Finally in §3.7 we use our results in a low gain calculation and compare the result with [2].
The proofs of the two averaging theorems are presented in §4 and they are based on an idea of Besjes (see [13,14,15]) which leads to proofs without using a near-identity transformation, as in usual treatments of, e.g., [6,9,10]. The NR case is an example of quasiperiodic averaging with a rigorous treatment of a small divisor problem in what is surely the simplest setting. The NtoR case is an example of periodic averaging. A novelty of our approach is that we use a system of differential inequalities, rather than the usual Gronwall inequality, to obtain better error bounds.
The appendices contain calculations needed in the main text. Appendix A provides properties of the Bessel expansion of the function jj which is introduced in Section 3.2. In Appendices B,C we study the next-to-leading order terms g 1 , g 2 used in Theorem 1 and in Appendices D,E we study the next-to-leading order terms g R 1 , g R 2 used in Theorem 2. Appendix F gives an outline of a rigorous approach to regular perturbation theory which could be made into a theorem at the level of our averaging theorems. It is applied in §3. 4.4. Appendix G provides some formulas used in Section 3.7. In Appendix H we discuss E = E r /cB u in the high gain regime and obtain a crude upper bound estimate of it. Finally, in Appendix I we show that the solution of the system of differential inequalities that is used in the proof of both averaging theorems (as well as in Appendix F) is indeed a solution.
2 General Planar Undulator model

Lorentz force equations
Using SI units, the Lorentz equations for motion of a relativistic electron in an electromagnetic field, (E, B), arė is the velocity, γ is the Lorentz factor defined by and m and −e are the electron mass and charge respectively. We introduce Cartesian coordinates as follows: where e x , e y , e z are the standard unit vectors. Using (2.1)-(2.6) the system in Cartesian coordinates iṡ (2.10) We denote the undulator magnetic field by B u and the radiation field by (E r , B r ) whence A simple planar undulator model magnetic field which satisfies the Maxwell equations, ∇ · B u = 0 and ∇ × B u = 0, as in [3], is To satisfy ∇ · B u = 0, φ must satisfy Laplace's equation. The field (2.12) is easily constructed by separation of variables and requiring periodicity in z with period λ u and then taking the first eigen-mode (See, e.g., [16, p. 145]). The scalar field is φ = −(B u /k u ) sinh(k u y) sin(k u z).
The traveling wave radiation field we choose is also a Maxwell field and is given by where E r is a constant, h is a real valued function on R anď α(z, t) = k r (z − ct) , (2.14) and k r is the parameter mentioned in the Introduction. Our primary emphasis is on the standard monochromatic example where and ν ≥ 1/2 thus h(α(z, t)) = cos(νk r (z − ct)). Note that the prime ′ always indicates a derivative. Thus from §3 onwards we will use (2.15). However it is easy to carry through the first part of the analysis with general H and we do want to make a comment on the more general case. In this monochromatic case k r will be defined by (1.2),(1.3) and the ν will allow for a variable wave number for the traveling wave; it will be shown that ν = 1 gives the primary resonance with the concomitant pendulum normal form. The extension to a sum of monochromatic waves is trivial and won't be discussed.
where the canonical momentum vector P c is related to p by p = P c + eA and the vector potential A is given by Since A is independent of x the x-component, P c,x , of the canonical momentum vector P c is conserved, i.e., is constant along solutions of (2.7),(2.16)-(2.18) as is easily confirmed directly. We will not make explicit use of the Hamiltonian structure in the following. The MoA does not rely on a Hamiltonian structure and this frees us from having to deal only with canonical transformations as we proceed to put (2.7),(2.16)-(2.18) in an averaging standard form.

Motion in y = 0 plane with z as the independent variable
It is common to take the distance z along the undulator as the independent variable, rather than the time t. In fact after unsuccessfully trying to stay with t we decided to follow the common procedure. With the usual abuse of notation, we write, from now on x(z), y(z), p x (z), p y (z), p z (z) instead of x(t(z)), y(t(z)), p x (t(z)), p y (t(z)), p z (t(z)) whence the ODE's (2.7),(2.16)- The initial conditions at z = 0 will be denoted by a subscript 0, e.g., t(0) = t 0 . Clearly t 0 is the arrival time of an electron at the entrance, z = 0, of the undulator.
Here and in the rest of the paper we consider the initial value problem (IVP) with y 0 = p y0 = 0. It follows, with no approximation, that y(z) = p y (z) = 0 for all z and the six ODE's (2.22)-(2.25) reduce to four. The righthand sides (rhs's) of (2.22)-(2.25) are independent of x and so we do not need to consider the x equation until §3.6. It is standard, and also quite convenient, to replace p z by the energy variable γ. With γ(z) defined in terms of p x (z) and p z (z) by (2.4) and using (2.23) and (2.25), we obtain ). Finally, we takě α as a dependent variable in place of t and we define α(z) :=α(z, t(z)) = k r (z − ct(z)) . (2.26) Later it will be seen that α is a precursor to a generalization of the so-called ponderomotive phase which emerges naturally as we put the ODE's in a standard form for averaging. With the above four changes the ODE's for t, p x , p z in (2.22),(2.23),(2.25) become where the initial conditions are α(0) = α 0 := −k r ct 0 , p x (0) =: p x0 , γ(0) =: γ 0 .
Here p z must be replaced by and it is easy to see that (2.27)-(2.29) are then self contained. From now on we restrict p z to be positive: Note that, by (2.27), α is a strictly decreasing function whence, as one expects, z < c(t(z) − t 0 ). It is also easy to check that is conserved along solutions of (2.27)- (2.29). This conservation law is identical to (2.21) with y = 0. Recall that K was defined by (1.1). In summary, the solution of the IVP for (2.22)-(2.25) with y 0 = p y0 = 0, which entails y = p y = 0, is given in terms of the solution of (2.27),(2.29), i.e., of with

Standard form for Method of Averaging
We begin by introducing the normalized energy deviation η and its O(1) counterpart χ via as mentioned in the Introduction. Here γ c is a characteristic value of γ, e.g., its mean and ε is a characteristic spread of η so that χ becomes the new O(1) dependent variable replacing γ in (2.33),(2.34). We are interested in an asymptotic analysis for γ c large and η small as in an X-Ray FEL. Here we determine a relation between ε and γ c which leads to a standard form for the MoA and which will contain the FEL pendulum system at first order in the case of (2.15).
As a first step we introduce new variables, in addition to χ, as follows. From the conservation law in (2.32) we anticipate that the order of magnitude of p x will be mcK. In addition β z := p z /mcγ will be near 1 and so p z ≈ mcγ. Thus we define dimensionless momenta by (2.37) Of course, by (2.31), A natural scaling for z is so that the undulator period is 2π in ζ. Abbreviating where ′ = d/dζ and E is defined in (1.4). The initial conditions are θ aux (0, ε) = θ 0 := α 0 , χ(0, ε) = χ 0 . Moreover P z must be replaced, due to (2.30), by and P x must be replaced, due to (2.35), by Since p z > 0 we have 0 < P z < 1. We note that most derivations of the FEL pendulum take ∆P x0 = 0, see [2,3,4,5].
To expand P z we need and it is convenient to define q(ζ) := 1 + K 2 (cos ζ + ∆P x0 ) 2 =q + 2K 2 ∆P x0 cos ζ + K 2 2 cos 2ζ , (2.47) Clearlyq is the average of q(ζ) over ζ. Now P x is O(1) so, by (2.43), To transform (2.50),(2.51) into a standard form for the MoA we need to introduce dependent variables that are slowly varying. We anticipate that χ will be slowly varying, i.e., E εγ 2 c will be small. To remove the O(1) in (2.50) we define Note that Υ 0 and Υ 1 depend only on K and ∆P x0 and that (2.55) Thus the system (2.50),(2.51) becomes which are now in standard form. Up to this point K r has not been fixed but now it is convenient to take We now relate θ to the so-called ponderomotive phase. We have, from (2.26),(2.40), (2.52) and (2.53), Using (2.39) and (2.64) we obtain θ(k u z, ε) = k r (z − ct(z)) + k u z + Υ 0 sin k u z + Υ 1 sin(2k u z) . (2.65) For ∆P x0 = 0 the variable θ is the so-called ponderomotive phase, i.e., θ(k u z, ε) = (k u + k r )z − k r ct(z) + Υ 1 sin(2k u z) , (2.66) where, for ∆P x0 = 0, (2.67) Thus in our context the ponderomotive phase arises naturally in the process of finding the distinguished relation between ε and γ c and transforming to slowly varying coordinates. In standard treatments it is introduced heuristically to maximize energy transfer.
Of course always γ ≥ 1 and, in fact, in applications γ c , γ ≫ 1. However for our purposes it is convenient to base our work on the maximal domain (2.77).
(2) The transformation to the slowly varying θ in (2.52) works nicely because ζ (equivalently z) is the independent variable. If we had stayed with t as the independent variable this step wouldn't work.
(3) Equations (2.62),(2.63) are in the standard form for the MoA. However we did not prove that the O(ε 2 ) are actually bounded by an εindependent constant times ε 2 . In the monochromatic case in §3 we will show that the two O(ε 2 ) terms are truly bounded by Cε 2 on an appropriate domain for appropriate constants C.
(4) For the results of this paper the normalized field strength E cannot be too big (or ε won't be small) and it cannot be too small or another distinguished case will come into play. Of course for a seeded FEL, E will be set by the seeding field. In Appendix H we present two very crude bounds that have some relevance to the beginning stages of a High Gain FEL. Here we simply note that for E = 1000, ε is approximately 0.001.
In an early approach to this problem we built a normal form analysis assuming E small, so that the radiation field was a small perturbation of the undulator motion. We thus considered E as a small parameter in addition to 1/γ c . This led to another distinguished case, which also had a resonant structure but with a different pendulum type behavior. Later we realized that E is not necessarily small for cases of interest and we were led to the current case of (1.5).
(5) As will become clear in §3 the normal form for (2.62) is θ ′ = ε2χ. The normal form of (2.63) depends on h. In the monochromatic case h(θ−Q(ζ)) = cos(ν[θ−Q(ζ)]) and the nonresonant, resonant and nearto-resonant structure will appear as ν varies. In particular the primary resonance will appear at ν = 1. However it is curious that if withh(ξ) smooth and localized near ξ = ±1 the resonance effect is washed out in first-order averaging. We will explore this briefly in §5.
We are studying the consequence of this in the collective case.
3 Special Planar Undulator Model and averaging theorems

The basic ODE's for the monochromatic radiation field
In this section we introduce the notation which will allow us to state and prove our three propositions and two theorems. With (2.15),(2.70), (2.74) we show the dependencies of P x and P z on (θ, χ, ζ, ε, ν) by the replacement Note that, by (2.77),(3.1), From now on, we restrict ε to a finite interval (0, ε 0 ]. We are of course interested in ε small, i.e., 0 < ε ≪ 1, and so, without loss of generality, we take Using (3.4), (3.5) we define the open set D(ε, ν), for 0 < ε ≤ ε 0 , ν ≥ 1/2, by which is our maximal domain in extended phase space. Accordingly we define the domain of Π x to be {(θ, ζ, ε, ν) ∈ R 4 : 0 < ε ≤ ε 0 , ν ≥ 1/2} and the domain of Π z to be {(θ, χ, ζ, ε, ν) ∈ (D(ε, ν) × R 2 ) : 0 < ε ≤ ε 0 , , ν ≥ 1/2}. It is easy to check that on the domain of Π z the argument of the square root in (3.3) is positive and, for (θ, χ, ζ) ∈ D(ε, ν), we have (2.76) and Moreover with (2.15) the ODE's (2.72),(2.73) become where q and Q are defined in (2.47),(2.53). Of course the initial conditions are θ(0, ε) = θ 0 , χ(0, ε) = χ 0 . As suggested by (2.62), (2.63) we now write (3.8),(3.9) as where f 1 , f 2 are given by so that g 1 , g 2 are given by The ODE's (3.8), (3.9) and their equivalent form, (3.10),(3.11), will be the subject of Theorem 1, i.e., the averaging theorem for the NR case (see also Definition 1 in §3.2). They will also be the basis for the NtoR case. We need an appropriate domain for the vector field in (3.10),(3.11) when it comes to averaging theorems. There are two types of singularities in (3.10), (3.11). The first involves the ε dependence of g 1 , g 2 as ε → 0+. On the surface it appears that the first term on the rhs of (3.14) is O(1/ε 2 ), however it is O(1). In fact, when combined with the second term the rhs is O(ε 2 ) so that g 1 is O(1). Similarly, g 2 appears to be O(1/ε), however again there is a cancellation so that g 2 = O(1). This should not come as a surprise since the construction of the distinguished case (see the remarks before (2.59)) Proposition 1 makes this precise by finding the limits of g 1 , g 2 as ε → 0+. Thus the ε = 0 singularity is removable. There are also singularities for Π z = 0, εχ = −1 which are not removable. This is reflected in the fact that even though f 1 , f 2 are nice, g 1 , g 2 have these singularities. However these singularities are excluded from our maximal domain D(ε, ν) (see (2.76),(3.7)) and so the vector field in (3.10), To motivate W we note that, by (3.2) and since ν ≥ 1/2, Clearly, by (3.16),(3.17), Now that the structure of the g i have been characterized at the level needed for the averaging theorems, we discuss the structure of the f i defined in (3.12), (3.13). Clearly f 1 is 2π periodic in ζ. We write, by (2.53),(3.13), and 2π-periodic in ζ 1 and ζ 2 we conclude from (3.21) that f 2 is a quasiperiodic function of ζ with two base frequencies 1 and ν (for the definition of quasiperiodic functions, see, e.g., [9]). To make the resonant structure explicit we write f 2 as and Z being the set of integers. Since jj(·; ν, ∆P x0 ) is a 2π-periodic C ∞ function its Fourier series (3.24) is absolutely convergent, i.e., n∈Z | jj(n; ν, ∆P x0 )| < ∞ whence ∼ in (3.24) can replaced by =. The f 2 in Eq. (3.11) can now be written which clearly shows the resonant structure in that the ζ average of f 2 is zero for ν = integer. In Appendix A we find and J k is the k-th-order Bessel function of the first kind. Note that jj(−ζ; ν, ∆P x0 ) = jj(ζ; ν, ∆P x0 ) * which implies jj(n; ν, ∆P x0 ) is real. This is confirmed in the explicit form of (3.27),(3.28) since the J k are real valued.
The time average of f 1 in (3.12) is clearlȳ Since the series in (3.26) converges uniformly in ζ and since exp(i(n − ν)ζ) = δ n,ν , the time average of the quasiperiodic f 2 is 30) where N denotes the set of positive integers and where we have used the fact that jj is real. This forms the basis of our definitions of resonant, nonresonant and near-to-resonant frequencies ν.
Recall 0 < ε ≤ ε 0 ≤ 1 and that we take N to denote the set of positive integers. ✷

Remark:
In our various estimates we need to keep ν away from zero but want to include ν = 1 since it is the primary resonance. Thus we require ν ≥ 1/2 and since ε ≤ 1 we require |a| ≤ 1/2.
It follows from the Fourier form of (3.26) that it is only possible to have a nontrivial normal form, i.e.,f 2 = 0, if ν is an integer. Thus ν = 1 is the primary resonance as discussed in the Introduction, justifying the choice of K r in (1.3) and (2.61). The resonant normal form at ν = k is of the pendulum form with where x n := (2n + 1)Υ 1 and n = 0, 1, ... with Υ 1 defined in (2.54). Thus, for ∆P x0 = 0, (3.31) gives the standard FEL pendulum system (see also [2], [4], [5], [17]): For a general quasiperiodic function with base frequencies 1 and ν it is possible to have a nontrivial normal form for every rational ν and thus ν would be defined to be resonant if it were rational. Sincef 1 (χ) is independent of ν it plays no role in Definition 1. Clearlȳ f 2 (θ; ν) = 0 if ν is NR. We state our NR theorem in Theorem 1 for the ∆-NR case. In fact because of a small divisor problem the theorem will require ν to stay away from neighborhoods of resonances in order to get an o(1) error bound as ε → 0+. We will obtain an O(ε 1−β ) bound for β ∈ (0, 1] depending on the distance from the resonance by letting ∆ = O(ε β ). In the resonant case we will explore an O(ε) neighborhood of the resonance. This will allow us to at least partially fill the gap between the ∆-NR νs in the NR theorem and the νs in the NtoR theorem. The way this occurs will be seen in the error analysis in the proofs of Theorems 1 and 2.

The nonresonant case and its normal form
The exact ODE's in the NR case are (3.10), (3.11). Clearly they are the same in the ∆-NR subcase. By definition, the NR normal form, i.e., the normal form with ν NR, is obtained from (3.10),(3.11) by dropping the O(ε 2 ) terms and averaging the rhs over ζ holding θ, χ fixed whence, by (3.29), (3.30), with the same initial conditions as in the exact ODE's, i.e., v The solutions of (3.34),(3.35) with ε = 1 play an important role in the statement and proof of Theorem 1 and we refer to v(·, 1) = (v 1 (·, 1), v 2 (·, 1)) , Our basic result in the NR case will be that |θ . Putting ∆ into the order symbol allows one to discuss ∆ small, e.g., as a function of ε. The precise statement is given in §3.5.1 and its proof is given in §4.1. (3.38) Remark: Proposition 1 entails that the vector field on the rhs of (3.10),(3.11) is a C ∞ function on W (ε 0 ) × R (whence the vector field on the rhs of (3.8),(3.9) is a C ∞ function on W (ε 0 ) × R, too). Proposition 1 will allow us to use, in Theorem 1, the domain W (ε 0 ) × R. Furthermore the domain is large enough to contain the χ of physical interest (see Proposition 3 in §3.5.3).
The O(ε) neighborhood of k is natural in first-order averaging. If |ν − k| is too small then the normal form will be close to the resonant normal form and if |ν − k| is too big, then ν will be in the NR regime. Eq. (3.41) clearly includes the resonant case for a = 0. We start from (3.10),(3.11),(3.13) use (3.41) and obtain By the remarks after (3.15), the vector field in (3.42),(3.43) is of class C ∞ on the maximal domain D(ε, k + εa). Since f 1 in (3.42) is independent of ε the normal form associated with it will be the same as in the NR case. We now need to study the ε dependence of f 2 in (3.43). From (3.22), where we have used from (3.23) that Therefore we rewrite f 2 as We can now write the basic system for the MoA, in this NtoR case. From (3.42)-(3.47) we obtain and where g R 2 can be rewritten as follows. By (3.21) we have and, by (3.23),(3.47), (3.54) Using (3.53),(3.54) we can write (3.52) as which will be useful in obtaining bounds for g R 2 in Appendix E. The following proposition is the analogue of Proposition 1 for the NtoR case.
For ∆P x0 = a = 0, eq.'s (3.58),(3.59) are the standard FEL pendulum equations, given by (3.32), (3.33). In the special case when K 0 (k) = 0 the ODE's (3.58),(3.59) are the same as NR equations (3.34),(3.35) and so this case needs no further comment. Note that the special case K 0 (k) = 0 occurs, e.g., when ∆P x0 = 0 and k even (see the remark after (A.11)). The ultimate justification for the normal form (3.58),(3.59) comes from the averaging theorem itself. However, if we replace εζ in (3.49) by τ and add the equation τ ′ = ε then this, together with (3.48),(3.49), is in a standard form for "periodic averaging" (=averaging over a periodic function) and the normal form (3.58),(3.59) is obtained by averaging over ζ holding θ, χ, τ fixed. In this θ, χ, τ formulation standard periodic averaging theorems apply for the 3D system of θ, χ, τ , see, e.g., [6,13] and Section 3.3 in [10]. We will however prove an averaging theorem directly tuned to (3.48),(3.49) both to show the reader a proof in a simple context and in the process we obtain nearly optimal error bounds which are stronger than in those standard theorems.

Structure of the NtoR normal form solutions
Here we write the solution of the IVP for the normal form system (3.58), (3.59) in terms of solutions of the simple pendulum system and discuss their behavior. Therefore in this Section we exclude the simple subcase where K 0 = 0.
Thus we have scaled away the ε and made the transformed system autonomous. Solution properties of (3.63),(3.64) are easily understood in terms of its phase plane portrait (PPP). However it is more convenient to transform it to the simple pendulum system where Ω = Ω(k) := 2k|K 0 (k)| .
The systems obtained by linearizing about these equilibria are centers for l even and saddle points for l odd. From the theory of Almost Linear Systems (see, e.g., [19]), it follows that the equilibria are centers and saddle points for the nonlinear system. A conservation law for the simple pendulum system is easily derived by first noting that the direction field is given by This equation is separable and has solutions given implicitly by 1 2 Y 2 + 1 − cos X = const. Thus is a constant of the motion which is easily checked directly. Incidentally E P en is also a Hamiltonian for the ODE's (3.58),(3.59) but this plays no role here. The PPP is easily constructed from the so-called potential plane which is simply a plot of the potential U(X) vs. X, see [20]. The PPP shows that the solutions of the simple pendulum system has four types of behavior, the equilibria mentioned above, libration, rotation and separatrix motion. These can be characterized in terms of E P en . Clearly, E P en is nonnegative, the centers correspond to E P en (X, Y ) = 0 and the saddle points and separatrices to E P en (X, Y ) = 2. The motion is libration for 0 < E P en (X, Y ) < 2, rotation for E P en (X, Y ) > 2 and separatrix motion for E P en (X, Y ) = 2 with Y = 0.
In the libration case the solutions are periodic, which is easy to show, and the period as a function of amplitude, [21], is given by We denote by B n the n-th pendulum bucket which is defined by with n ∈ Z. Note that, by (3.72),(3.74), Note also that, by (3.70),(3.71),(3.72), We can now discuss the four cases of equilibria, libration, rotation and separatrix motion. In each case, using (3.78),(3.79), (3.80), we will find d min and we will at the same time observe that d min (I) Equilibria regime: Y 0 = 0 and either E P en (X 0 , Y 0 ) = 0 or 2.
Clearly X 0 = πl where l ∈ Z and, by (3.72), Clearly, by direct substitution, these are solutions of (3.58),(3.59). Incidentally these solutions are stable for l even and unstable for l odd.
Clearly, due to (3.81),(3.82),(3.84), (3.85), we can choose In this case Z 0 (θ 0 , χ 0 , k, a) ∈ B n(θ 0 ,k) where the integer n = n(θ 0 , k) is determined by the condition and it is easy to show that the periodic part has amplitude determined by the max and min values of X and Y and the linear growth term is The maximum values X max and Y max of X and Y satisfy, by (3.74), and the minimum values X min and Y min of X and Y are given by Here arccos is the principle branch of the inverse cos mapping We now determine d min 1 , d min  In this case (X, Y ) ∈ B n(θ 0 ,k) where the integer n = n(θ 0 , k) is determined such that |X 0 (θ 0 , k) − 2πn(θ 0 , k)| < π. Clearly For Y 0 > 0, X is increasing and Y is periodic such that and for Y 0 < 0, X is decreasing and Y is periodic such that Clearly v 2 (·, ε) is periodic. We now determine d min 1 , d min 2 and χ ∞ . It follows from (3.101),(3.102) that for any choice of Y 0  Ωεζ 0 E P en (X(s), Y (s))ds whence, by (3.78), Clearly the simple pendulum system is central to our NtoR normal form approximation. Every student who has taken a course in ODE's or Classical Mechanics has studied the pendulum equation at some level. However, not every reader of this paper may know the general settings of the equation.
So, as an aside, we thought some might be interested in knowing how it fits in a broader context. First, the pendulum equation is a special case of the nonlinear oscillatorẍ + g(x) = 0 and second, the nonlinear oscillator is an important subclass of the class of second-order autonomous systemṡ x = f (x, y),ẏ = g(x, y). The nonlinear oscillator is discussed in many texts, and here we mention [19] and [22]. Its PPP is easily constructed from the potential plane as mentioned above and in [20]. After the class of linear systems, the class of second-order autonomous systems has the most well developed theory [23]. Here the qualitative behavior is completely captured in the PPP's. What's missing from a PPP is the time it takes to go from one point on an orbit to another, but this is easily determined using a good ODE solver. The limiting behavior of all solutions bounded in forward time is given by the celebrated Poincaré-Bendixson theorem and as a consequence existence of periodic solutions can be inferred and the possibility of chaotic behavior is eliminated. It also follows that a closed orbit in the phase plane corresponds to a periodic solution.

NR limit far away from the pendulum buckets
Even though for small ε there will be gaps in ν between the ∆-NR and NtoR cases, as we will discuss in the context of Theorems 1,2, we show here that far away from the pendulum buckets the NR normal form emerges. While not a rigorous argument since we do not quantify "large" it is a consistency check. As in Section 3.4.3 we exclude the simple subcase where K 0 = 0.
We are now ready to state the NR theorem which roughly concludes that To make the statement of the theorem concise, we now set up the theorem in nine steps.
(8) (Bounds for g 1 , g 2 on rectangle) Appendix C gives a very detailed derivation of quite explicit minimal bounds for g 1 and g 2 . There we show, for (θ, where i = 1, 2 and d 1 , d 2 satisfy (3.114) and where the finite C 1 and C 2 are defined by (C.27),(C.30).
We refer to B 1 , B 2 as "Besjes terms" and their importance will be seen both in the bounds presented in Theorem 1 and in the proof of the theorem where they eliminate the need for a near identity transformation (for the latter, see [6,9,10,11,12]).
With this setup we can now state the NR approximation theorem.
With the setup given by items 1-9 of the above preamble we obtain, for ζ ∈ I(ε, T ), that More precisely
The proof of Theorem 1 is presented in §4.1. Note that the symbol O(ε/∆) conveys that the error contains the factor 1 ∆ .

NtoR case: ν = k + εa (Periodic Averaging)
The NtoR case was defined in §3.2. The exact ODE's to be analyzed in this case were derived in §3. The setup for the theorem is as follows.
(8) (Bounds for g R 1 , g R 2 on rectangle) Appendix E gives a very detailed derivation of quite explicit minimal bounds for g R 1 and g R 2 . There we show that, for (θ, We will also need B R 1,∞ , B R 2,∞ defined by where i = 1, 2.
We refer to B R 1 , B R 2 as "Besjes terms" and their importance will be seen both in the bounds presented in Theorem 2 and in the proof of the theorem where they eliminate the need for a near identity transformation.
With this setup we can now state the NtoR approximation theorem.
The proof of Theorem 2 is presented in §4.2.

Remarks on the averaging theorems
(1) We have now explored the θ, χ dynamics as a function of ν in the ∆-NR case and ν = k + εa in the NtoR case. However asymptotically there are gaps for ν ∈ (k + εa, k + ∆) when ε is small. For ∆ = O(ε) the NR normal form breaks down because the error is O(1), however we can come close to the NtoR neighborhood by letting ∆ = O(ε β ) with β near 1 however the error in the NR normal form does deteriorate to O(ε 1−β ). It could be interesting to explore the dynamics in these gaps.
(2) Important for the functioning of the FEL is knowledge of the fraction of the bunch that occupies a bucket. From the analysis in §3.4.3 this occurs for ICs in the libration case, i.e., 0 < E P en (Z 0 ) < 2 where Z 0 is given in (3.66). One can thus determine the set of (θ 0 , χ 0 ) for which Z 0 occupies the pendulum buckets. For more details on the pendulum motion and its impact on the low gain theory see §3.7.
(3) Mathematically we want to make sure the buckets are covered by our domain W (ε 0 ) × R for physically reasonable χ 0 . From (3.71) the range of the v 2 -values in the buckets for the NtoR normal form is the interval (− Ω k + a 2k , Ω k + a 2k ). Now a ≥ −1/2 so, for every k, the smallest v 2 in a bucket is − Ω k − 1 4k whence, since k ≥ 1, the very smallest v 2 in a bucket is −Ω − 1/4. Thus requiring entails that χ b is smaller than any χ-value inside the buckets and smaller than any χ-value on the separatrix. It is plausible to restrict the physically interesting χ-values to be greater than, say 3χ b . The condition that (θ, 3χ b ) ∈ W (ε 0 ) entails that the buckets are covered by W (ε 0 ) and that ε 0 satisfies the constraint 3χ b > χ lb (ε 0 ). The following proposition is a corollary to Propositions 1,2.
Let also ∆γ be a positive constant and let If χ ∈ R satisfies the condition: which entails (3.153). ✷ Note that the condition: 1 ≤ γ c − ∆γ in (3.152) is not used in the proof of Proposition 3 but serves to guarantee that χ satisfies the physical condition: γ ≥ 1, i.e., 1 ≤ γ c (1 + εχ).
(4) In applications of Theorems 1,2, T should be chosen so that z ∈ [0, T /εk u ] is the domain of interest, e.g., so that T /(εk u ) is the length of the undulator.
(5) In many discussions of this nature, researchers often just assert the existence of bounds, for example by using the well known fact that a continuous function on a compact set is bounded, or bounds are obtained which are crude. Here we wanted to do more. By using, in the proofs of Theorems 1 and 2, a system of differential inequalities instead of the Gronwall inequality we have been able to use two Lipschitz constants in each proof instead of their maximum and in a similar manner can treat the two Besjes' terms independently as well as the components of g and g R . Furthermore, we believe the Besjes bounds and the bounds on g 1 , g 2 , g R 1 , g R 2 are nearly optimal.
We also note that there are only 3 restrictions on the size of ε 0 and thus ε. The first is that we require ε 0 ≤ 1. But this is only a matter of convenience and is really no restriction at all since the averaging theorems are only useful for ε small. The second restriction is in item 5 of the preambles to the two theorems, however as indicated there this is not a significant restriction. Thus the only real restriction is keeping the solution away from the boundary ofŴ ,Ŵ R in order to obtain I(ε, T ) = [0, T /ε]. This is an optimization problem; by makinĝ W ,Ŵ R larger, ε can be larger, however this is compensated to some extent in the Lipschitz constants as well as the bounds on g 1 , g 2 , g R 1 , g R 2 which would become larger. Nonetheless, the situation is quite good in comparison to say KAM or Nekhoroshev theorems (see e.g., [8]), where the restrictions on ε are quite severe and it is with great effort that the restrictions on ε have been improved in some applications, e.g., solar system problems.
where the finiteness of the rhs follows from the fact that the function jj(·; ν, ∆P x0 ) is of class C ∞ . Since jj(·; ν, ∆P x0 ) is also 2π-periodic we can apply Parseval's theorem to get (3.156) It also follows from (3.23) that so that, by (3.155),(3.156), , which gives an upper bound forB R 2 (T ) in (3.149).

Low Gain Calculation in the NtoR regime
Low gain theories in [2,3,4] are done in the context of the pendulum equations, i.e., (3.58),(3.59) with a = 0, ∆P x0 = 0, and k = 1. Here we will not make those assumptions and we define the gain by where v 2 is given in (3.71) and ( ) θ 0 denotes the average over θ 0 . This is consistent with [2,3,4]. The gain G could be calculated numerically using a quadrature formula and an ODE solver, however standard treatments calculate it perturbatively using a regular (and thus short time) perturbation expansion. We could do a regular perturbation expansion in (3.58),(3.59) by letting v i = 4 k=0 ε k A ik + O(ε 5 ) and using Grownwall techniques to make the O(ε 5 ) error rigorous (see [25, p.594] for an example of a regular perturbation theorem at first order and its proof). However at the fourth order needed here this would be quite cumbersome. Because of the special scaling structure in (3.58),(3.59) as given in (3.61) we can use a Taylor expansion. For ε = 1 we get from (3.58), (3 and we expand v 2 (·, 1) about τ = 0 so that It follows from (3.173),(3.174) that the average over θ 0 leads to which gives, by (3.171), This shows the effect of a and k on the gain. We now compare our gain formula in (3.176) with the corresponding calculation in [2], where a = 0, ∆P x0 = 0, and k = 1. From our NtoR normal form system (3.58),(3.59) and letting θ = v 1 and η = εv 2 we obtain the IVP where ǫ = ε 2 K 0 (1). The procedure in [2] is a regular perturbation expansion in ǫ that does not assume that η 0 is small. Proceeding as they do, we write (3.180) We find It follows that η 1 (ζ) θ 0 = 0 and We can rewrite (3.186) as sin τ τ 2 , τ := η 0 ζ , (3.187) and the gain becomes consistent with [2]. For η 0 small, which is required by our averaging approximation (since η 0 = εχ 0 and χ 0 = O(1)), we obtain from (3.186) that It follows from (3.188),(3.189) that as in (3.176) with a = 0 and k = 1.
Thus we see that (3.176) is consistent with the standard gain formula for τ = η 0 ζ small. The O(ε 6 ) error in (3.176) can be made precise by estimating the remainder term in (3.173). However, we cannot justify the gain formula either in (3.176) or in (3.188) in the context of our Lorentz system in (2.22) -(2.25), because our NtoR normal form approximation only gives an approximation to O(ε). Thus a justification of the gain formulas, based on our Lorentz system, would need to come from elsewhere, e.g., a numerical calculation based on (3.8) and (3.9).

Proof of averaging theorems
In §4.1 we prove the NR theorem, Theorem 1 of §3.5.1, and in §4.2 we prove the NtoR theorem, Theorem 2 of §3.5.2.
Taking absolute values, applying the Lipschitz condition onŴ (θ 0 , χ 0 , d 1 , d 2 ) and defining  where we also used that I(ε, T ) ⊂ [0, T /ε] and where we have introduced the R i as in the proof of the Gronwall inequality for a single integral inequality (the Gronwall inequality is discussed in many ODE books, see, e.g., [24, p.36] and [26, p.310 and 317]). ζ ∈ I(ε, T ). Recall that L 1 , L 2 , C 1 , C 2 , B 1 , B 2 are defined in items 7,8 and 9 of the preamble to the theorem. For convenience we have suppressed the ε dependence of e 1 and e 2 .
Before we proceed with the proof, several comments are in order.
1. We refer to the terms B 1 (ζ), B 2 (ζ) in (3.121) as Besjes terms since they were introduced by him in order to prove an averaging theorem without a near identity transformation; a simplification. Standard proofs use the near identity transformation (see e.g., [6,9,10]).
One may fear that the Besjes terms could grow as large as O(1/ε) for ζ ∈ [0, T /ε], i.e., that B i,∞ (T /ε) = O(1/ε). However this doesn't happen here since, by (3.127),B 1 ,B 2 (T, ∆) are upper bounds for B i,∞ (T /ε) and are ε independent. Two facts are mainly responsible for this: (a) the fact that for fixed v 1 and v 2 the integrands have zero mean, i.e., the quantities in (3.122) have zero mean in s, and (b) the fact that v 1 (s, ε) and v 2 (s, ε) are slowly varying.
2. We maintain the system form in (4.12),(4.13). We could add these two inequalities and obtain an error estimate using a Gronwall inequality.
3. We have a draft of a general paper on quasiperiodic averaging which uses the Besjes idea and deals with the small divisor problem (See [14]). However the proof we are presenting here is simple, the small divisor problem is trivial and the error bounds are quite explicit. Thus we feel it is good to give complete proofs here rather than appealing to a more general theory. Also it serves the pedagogical purpose of showing how an averaging theorem is proved in a simple context; here the context of (3.10), (3.11) and (3.48), (3.49). We have incorporated the Besjes idea in much of our previous averaging work, see [13,25,27,28,29].

Summary and future work
We started with the 6D Lorentz equations for a planar undulator in (2.7),(2.16)-(2.18) with time as the independent variable. In §2.2 we introduced z as the independent variable and considered the IVP at z = 0 with y 0 = p y0 = 0. Solutions of this system are completely determined by the solutions of our basic 2D system (2.33),(2.34) for α and γ. This basic 2D system is the starting point for the rest of the paper and the first step is to transform it into a form for first-order averaging; the subject of §2.3. We introduce ζ = k u z as the new independent variable, and χ as a new dependent variable by γ = γ c (1 + εχ). Here we are thinking of electrons as part of an electron bunch with γ c as a characteristic value of γ and ε as a measure of the energy spread so that χ is an O(1) variable. We thus arrive at the system for (θ aux , χ) given in (2.41),(2.42) and we are interested, in this FEL application, in an asymptotic analysis for ε and 1/γ c small. Expanding the vector field for (2.41),(2.42) gives (2.50),(2.51). Here θ aux is not slowly varying and we thus introduce the generalized ponderomotive phase, θ, in (2.52) which leads to the slowly varying form of (2.56),(2.57). Most importantly, we discover that in order for θ and χ to interact at first order we must have ε = O(1/γ c ) and without loss of generality we take (1.5) as a result of (2.58). Finally we obtain (2.62),(2.63) which is in a standard form for the MoA. Consequently this will lead to a pendulum type behavior which is central to the operation of an FEL. The MoA can be applied to (2.62),(2.63) after an appropriate h is defined and the rest of the paper, in Sections 3,4, focuses on the monochromatic case of (2.15).
Before continuing with the summary we note that in the collective case there is a continuous range of frequencies and so it is natural to ask, "what happens in the noncollective case considered in this paper if there is a continuous range of frequencies?". Here h can be modeled as in (2.78), i.e., In the nonsmooth monochromatic caseh(ξ) = [δ(ξ − ν) + δ(ξ + ν)]/2 and (5.1) gives h(α) = cos(να) as in the monochromatic case of (2.15), and, as we have discussed in §3, there are resonances for integer ν. However we have found that in the smooth case the average of (cos ζ + ∆P x0 )h(θ − Q(ζ)) is zero and so the averaging normal form for (2.62),(2.63) is just the NR normal form of §3.3. Thus a smoothh(ξ), localized near the ν = 1 monochromatic resonance, washes out the effect of that resonance in the first-order averaging normal form. This does not mean that there is no resonant behavior near ν = 1 because it may not be possible to prove an averaging theorem. We are pursuing this. Furthermore even if an averaging theorem can be proven there might still be an effect in second-order averaging. In §3 we begin by determining the O(ε 2 ) terms of (2.62),(2.63) using (2.72),(2.73). Thus we obtain (3.10)-(3.15) as our basic system for θ, χ. Proposition 1 gives a domain, W (ε 0 ) × R, on which g 1 , g 2 are well defined as well as their limits as ε → 0+. In particular the vector field in (3.10), (3.11) is well defined on W (ε 0 ) × R.
Eq.'s (3.10), (3.11) are in a standard form for the MoA and for each ν the normal form is obtained by dropping the O(ε 2 ) terms and averaging f 1 , f 2 over ζ. However the average of f 2 is not clear from (3.13) and it is convenient to expand it in a Fourier series which is given in (3.26)-(3.28). The average is then easily obtained in (3.30) and leads to the definition of NR, ∆-NR, resonant and NtoR ν. The NR normal form equations are θ ′ = ε2χ and χ ′ = 0 and the resonant normal form equations are given by (3.31). The NR case is stated precisely in §3.3. Instead of focusing on the resonant case of (3.31) we consider in §3.4 the more general NtoR case where we study the dynamics in neighborhoods of the ν = k resonances. If the neighborhood is too small then the resonant normal form of (3.31) will be dominant thus the natural neighborhood to study with first-order averaging is O(ε) and this is the content of §3.4. Replacing ν by k + εa, our basic equations (3.10), (3.11) are rewritten in (3.42), (3.43). The function f 2 in (3.43) has two ε dependencies one of which contributes to the O(ε 2 ) term and we are led to the basic NtoR system (3.48)-(3.52). Proposition 2 is analogous to Proposition 1 by giving us the domain W (ε 0 ) × R on which g R 1 , g R 2 are well behaved as well as their limits as ε → 0+. In particular the vector field in (3.48),(3.49) is well defined on W (ε 0 ) × R. In §3.4.2 the NtoR normal form is presented in (3.58), (3.59). The solution structure is conveniently illuminated, in terms of the simple pendulum system, in §3.4.3.
The simple pendulum exhibits four types of behavior and these are exploited to discuss the structure of solutions of (3.58), (3.59) in these four cases.
At this stage we have normal forms for ν ∈ [k + ∆, k + 1 − ∆] and ν = k + εa. However there may be gaps between the dynamics covered by the ∆-NR normal form and that of the NtoR normal form. So it is comforting to note that there is a link between the two dynamical behaviors in that the NtoR normal form is approximated by the NR normal form far away from the pendulum buckets as discussed in §3.4.4.
In §3.5 we state the two averaging theorems which relate the ∆-NR and NtoR normal form approximations to the corresponding exact systems. Each theorem has a detailed preamble which sets up a compact statement of the theorem. The theorems establish the main results of the paper. Namely that the normal form solutions give an O(ε) approximation to the exact solutions on long time, O(1/ε), intervals. In the ∆-NR case, the ν interval can be made larger by making ∆ smaller but this is at the expense of increasing the error as discussed in Remark (1) of §3.5.3.
The results of the theorems are applied in §3.6, where the normal form approximations are used to derive the approximate solutions of the Lorentz equations with z as the independent variable. In §3.7 we discuss the small gain theory for ν = k + εa based on our NtoR normal form and compare it with the standard theory for k = 1, a = 0. We do point out however, that we have not justified the low gain theory in the context of our NtoR averaging theorem as we mention at the end of §3.7.
Finally the proofs are given in §4. It can be seen that the proofs themselves are quite simple. The proofs are somewhat novel in that they do not use a near identity transformation, due to the Besjes approach, and they use a system of differential inequalities in the calculation of the error bounds, rather than a Gronwall type inequality, which leads to better error bounds. Therefore a solution of the system of differential inequalities is presented and verified in Appendix I. The first theorem, which is stated for the ∆-NR case, is an example of a quasiperiodic averaging theorem with its concomitant small divisor problem. It's inherently interesting in that the small divisor problem arises in what must be the simplest possible way. We develop the general theory of quasiperiodic averaging in [14]. The second theorem, which is stated for the NtoR case, is an example of periodic averaging which has a vast literature, however as mentioned above our approach here is novel. While the proofs of Theorems 1 and 2 are simple the whole application of the MoA is not. There was considerable work to put the problem into the standard form and considerable effort to calculate the bounds on g 1 , g 2 in Appendix C and g R 1 , g R 2 in Appendix E as well as their ε = 0 limits in Appendixes B and D.
We now comment on future work. First of all it would be interesting to include the y dynamics using (2.12) as we do, but not assuming the zero initial conditions in y, thus treating the full 3D dynamics.
Secondly, it would be interesting to study the helical undulator as we have done here for the planar undulator, i.e., via first-order averaging.
Thirdly, the work here sets the stage for a second-order averaging study of the NR case in (3.10),(3.11) using (3.39), (3.40) and the NtoR case in (3.48),(3.49) using (3.56), (3.57). In both cases we have systems of the form with approximating normal form given by whereF is the t-average of F andĜ is a linear combination of the t-average of G and terms depending on F (See [25, Section 5, p.610] for a construction of the normal form, i.e.,Ĝ, and an associated theorem and proof). Such a study would include a computation of the averages from (3.39),(3.40) and (3.56),(3.57) and then a phase plane analysis of this second order normal form system including a comparison with our first-order normal form system. In addition averaging theorems could be proven which we anticipate will give an O(ε 2 ) error on [0, T /ε] as in [25]. Furthermore, it would be interesting to see what happens in the NR case, e.g., is the energy deviation χ still conserved. We note that generically second-order averaging gives a better error estimate but the interval of validity remains the same (See [25] for situations where the time interval can be extended). Finally it would be interesting to know if, in the NtoR case, there is a breakdown in the integrability of the NtoR normal form due to separatrix splitting, [30], with the concomitant chaotic behavior. This is a delicate issue, which cannot be studied with second-order averaging, since (5.3) is a second order autonomous system and as such it cannot exhibit chaos as pointed out at the end of §3.4.3. This work could be a possible future project, however it does not appear to be interesting from the application point of view since collective effects are surely more important than noncollective effects at second order.
Fourthly, we are therefore eager to move on to the collective case based in part on our understanding here. As a first step we are studying the consequence of (H.1)-(H.6). We have not seen this form of the solution of the 1D wave equation in the FEL literature although the first equality in (H.3) is derived in many elementary PDE books. In addition, we are pursuing the issue raised in the paragraph containing Eq. (5.1), concerning a smoothh.

F Error bounds in a regular perturbation problem
Here we outline a derivation of error bounds in a regular perturbation problem of relevance for §3.4.4. This could be made into a theorem and proof at the level of §3.5 and §4 but we leave this to the interested reader (see [25, §2] for a detailed discussion of regular perturbation theory relevant here, complete with a theorem and proof). We write the IVP in (3.109) as Then the zeroth-order approximation is

I IVP for a system of differential inequalities
Here we present and verify a solution of the IVP for a system of differential inequalities which is used in §4.1, §4.2 and Appendix F. Consider the IVP for where a 1 , a 2 > 0 and R 1 , R 2 are of class C 1 . We want to show, for ζ ≥ 0, that where r ′ 1 = a 1 r 2 , r 1 (0) = R 1 (0) , (I.4) r ′ 2 = a 2 r 1 , r 2 (0) = R 2 (0) . whencer j (ζ) is decreasing w.r.t. ζ so that (I.7) follows from (I. 6). The result in (I.3) is a special case of a much more general theorem on pages 112-113 of [26]. That proof simplifies in the special case here and we present it for the interested reader. The proof proceeds by cleverly introducing a comparison function h. Here and we have, by (I.6),r We now show that, for j = 1, 2, ζ ≥ 0, r j (ζ) ≤ h j (ζ) . r ′ 1 (ζ 0 ) − h ′ 1 (ζ 0 ) < a 1 (r 2 (ζ 0 ) − h 2 (ζ 0 )) ≤ 0 , (I. 28) which is a contradiction.