Equilibrium of an Arbitrary Bunch Train with Cavity Resonators and Short Range Wake: Enhanced Iterative Solution with Anderson Acceleration

This paper continues the work of two previous treatments of bunch lengthening by a passive harmonic cavity in an electron storage ring. Such cavities, intended to reduce the effect of Touschek scattering, are a feature of fourth generation synchrotron light sources. The charge densities in the equilibrium state are given by solutions of coupled Ha\"issinski equations, which are nonlinear integral equations. If the only wake fields are from cavity resonators, the unknowns can be the Fourier transforms of bunch densities at the resonator frequencies. The solution scheme based on this choice of unknowns proved to be deficient at the design current when multiple resonators were included. Here we return to the conventional formulation of Ha\"issinski equations in coordinate space, the unknowns being charge densities at mesh points on a fine grid. This system would be awkward to solve by the Newton method used previously, because the Jacobian matrix is very large. Here a new solution is described, which is both Jacobian-free and much simpler. It is based on an elementary fixed point iteration, accelerated by Anderson's method. The scheme is notably fast and robust, accommodating even the case of extreme over-stretching at current far beyond the design value. The Anderson method is promising for many problems in accelerator theory and beyond, since it is quite simple and can be used to attack all kinds of nonlinear and linear integral and differential equations. Results are presented for ALS-U, with updated design parameters. The model includes harmonic and main r.f. cavities, compensation of beam loading of the main cavity by adjustment of the generator voltage, and a realistic short range wake field (rather than the broad-band resonator wake invoked previously).

tions, the equilibrium state is determined by coupled Haïssinski equations [1]. The coupling arises from the long range wake fields of high-Q cavity resonators. Every bunch in the train contributes to the excitation of these wakes, and thereby influences all other bunches and even itself (by a very small amount). There is also the short range wake field from geometric aberrations in the vacuum chamber, affecting only the bunch that excites it. An achievement of the present work is to include this effect accurately, which has not been done before in a multi-bunch framework.
In [1] the problem was solved for the simplest model, in which the only wake field comes from a single passive higher harmonic cavity (HHC). In [2] the model was extended to include the wake field (beam loading) of the main accelerating cavity (MC), and also the short range wake field (SR), roughly approximated by a broad band resonator model. The r.f. generator voltage was adjusted by a least-squares algorithm so that the sum of the generator voltage and the induced voltage of the main cavity closely approximated the desired accelerating voltage, in amplitude and phase. Contrary to the supposition of [1], the main cavity compensated in this way played a substantial role, spoiling to some extent the desired effect of HHC. Also, the consequence of SR was not negligible.
Both [1] and [2] were based on a formulation of the Haïssinski equations in which the unknowns are the Fourier transforms of the bunch densities at the frequencies of the cavity resonators. (Of course this is only possible if the entire wake field is due to resonators, narrow-or broad-band.) Then the number of real unknowns is n u = 2n b n r , where n b is the number of bunches and n r the number of resonators. For ALS-U we have n b = 284 and n r = 3 for the model with HHC+MC+SR, thus n u = 1704. Newton's method is readily feasible for solution of a system of this size or even much larger. In fact the method worked beautifully in [1], where n u = 656, but failed to converge for the full range of parameters desired in [2], with n u = 1704 or larger.
A possible way to avoid the divergence might be to return to the conventional formulation of the Haïssinski equations in coordinate space (z-space). The single-bunch equation in zspace, discretized on a mesh, is solved very robustly by Newton's method, even at currents far beyond realistic values [3]. Hoping for a similar success in the multi-bunch case, one encounters the problem of a very large Jacobian matrix. With a mesh of 100 cells and 284 bunches the dimension of the matrix is 28684 × 28684, which is uncomfortable if not impossible on a standard PC. Moreover, other light source designs have more than 1000 bunches. Instead of a full Newton method, one could consider more economical quasi-Newton procedures such as Broyden's method [8], [9].
Fortunately, a very simple and effective z-space solution turned up in the guise of a relaxed fixed point iteration, suggested by He, Li, Bai, and Wang [4]. This is Jacobianfree, and involves little calculation beyond repeated evaluations of the potential function that appears in the exponent of the Hassinski operator. This was successfully applied with parameters for ALS-U and other rings in [4], and I have verified the success for ALS-U.
Many of the problems posed in [2] were solved in a simpler way by this method, but there were still some failures of convergence in cases of interest. Convergence of the method is slow, the number of iterations required being of order 100, but the total computation time is nevertheless modest.
This development turned my attention away from Newton-type methods and toward Jacobian-free fixed point iterations. There is a long history of efforts to accelerate iterative sequences [10]. One of particular interest is Anderson's proposal of 1965 [11], [12], [15]. It has the potential both to cure divergence and to promote fast convergence. Remarkably, it does both in our problem, providing a very fast and robust solution throughout the parameter domain of interest.
Section II describes the relaxed fixed point iteration. The discussion leads naturally to the continuation method, which is a more standard approach to nonlinear equations and a technique that can be related to Anderson acceleration. Section III introduces Anderson acceleration. Section IV presents results for ALS-U, with parameters from the latest design report, somewhat different from those of our previous papers. Section V treats the relation of Anderson acceleration to Broyden's quasi-Newton method. Section VI presents conclusions and the outlook for future work.

II. RELAXED FIXED POINT ITERATION AND THE CONTINUATION METHOD
We wish to solve n equations in n unknowns, written compactly as where g may be linear but is nonlinear in general. To relate to later discussions we suppose that g has continuous first derivatives, as is true in our examples, although no derivatives appear in the numerical work. The function g will be called the basic map. The elementary fixed point iteration, or method of successive substitutions, tries to construct a solution by starting with some guess x 0 and forming a sequence {x k } as hoping that the sequence will converge to a solution x. If g maps a ball B r = {x| x < r } into itself, and reduces the distance between any two points in the ball, then the Contraction Mapping Theorem ensures that the sequence converges to a solution, the only solution in B r , for any x 0 ∈ B r . Here · can be any norm, but for analytic estimates of β a convenient choice is the maximum absolute value of components of x: The discretized Hassinski system has the contractive property for sufficiently small beam current, so we have the assurance of a unique solution at low current. Numerical calculations show that the sequence diverges for the larger currents of interest, so we seek a better algorithm.
We look for a better map h(x), which ought to generate a sequence that will converge, at least for some appropriate x 0 . The relaxed or damped fixed point iteration makes h(x) from g(x) in the simplest imaginable way, That is, if g produces too much change in x, reduce its contribution and use the current x itself for the rest of the iterate. It could be said that damping rather than relaxation is more descriptive of the process.
He et al. in [4] adopted this procedure to solve the coupled Haïssinski equations in the z-space formulation, and found it to be remarkably effective. They called it a relaxation iteration. The damping parameter α was chosen by experiment. At high values of current a relatively small value is required for convergence, say α = 0.1.
It looks as though He et al. generalized freely from the linear case, since their bibliography on the source of the method refers only to that case. When g is linear relaxation is widely used in the SOR algorithm, Successive Over-Relaxation, a variant of the Gauss-Seidel method intended to accelerate convergence. I have not seen much notice of the nonlinear application in the literature of numerical analysis, although one can find it in the context of particular problems. See for instance Section III of [5] where it is called "simple mixing".
A bit more insight into (4) accrues if we invoke a differential equation. Define f (x) = g(x) − x and consider the system of ordinary differential equations, One might hope to find a solution of f (x) = 0 as the limit of an asymptotically constant trajectory x(t) as t → ∞. Euler's method applied to (5) gives a sequence {x k } defined by Thus an attempt to find a constant asymptote by Euler's method is the same as trying to find a solution by the relaxed fixed point iteration, with α = ∆t.
There is no proof that a solution of f (x) = 0 is really to be found as the constant asymptote of a solution of (5). A more certain relation to a differential equation, explored in the literature, is obtained by considering a homotopy connecting an equation with known solution x 0 to the equation of interest [6], §7.5. Suppose that H : R n × R 1 → R n is a smooth function of both variables such that Then consider the trajectory x(t) defined by H(x(t), t) = 0. Differentiating we find As long as the inverse of the Jacobian H x exists, we have the differential equation in explicit If this equation has a solution extending from t 0 to t 1 , we have achieved a solution of f (x) = 0 as x = x(t 1 ), according to (7). A procedure along these lines is called a continuation method.
There are of course myriad ways to choose H. An especially useful choice is Since on the trajectory e −t f (x 0 ) = f (x), the equation (9) becomes When Euler's method is applied to (11) we get which is the damped Newton method with damping factor ∆t. It becomes the full Newton obeys a Lipschitz condition the differential equation (11) is subject to standard existence theorems. Boggs [7] has explored the use of more sophisticated integrators of (11), with the goal of approaching thr asymptote more quickly.
Broyden's method gives a way to approximate f −1 x (x k ), starting with a value for f −1 x (x 0 ) [9]. The update from step k to step k + 1 is obtained by adding a rank-1 matrix. With the where the row vector v T is the transpose of a column vector v. This is called Broyden's second method. His first method approximates the Jacobian itself in a similar way.
Sometimes it is adequate to take f x (x 0 ) = −I where I is the unit matrix, which is equivalent to using the relaxed iteration of (6) for the first step. With this reasonable choice and (13) one can carry out an approximate version of the Newton iteration (12), often to good effect. The undamped iteration would be preferred, but damping could be needed for convergence.

III. ANDERSON ACCELERATION
Again we wish to solve x = g(x).
Step k of Anderson's iteration makes use of the current and previous evaluations of the map, g(x j ) , j = k, k − 1, · · · . These evaluations contain valuable information. The update x k+1 is formed from a favorable linear combination of the With a given start x 0 we employ the following notations for k = 0, 1, · · · : Also choose an integer m ≥ 1 and define which will be the number of previous map evaluations used at the k-th step, not more than m.
To find a good linear combination of the g j , Anderson finds the coefficients in a minimal linear combination of the f j . That is, he solves the constrained linear least-squares problem Then the next iterate is taken to be For k = 0 the constraint alone determines the minimum, so that α 0 0 = 1 and x 1 = g(x 0 ). Anderson allowed extra flexibility by introducing a relaxation parameter β k , with a corresponding update In this scheme x 1 = β 0 g(x 0 ) + (1 − β 0 )x 0 , which is to say that the iteration starts with simple mixing. In view of the partial success of simple mixing, this would seem to be a good choice, at least for the first step. At later iterations one might put β k = 1.
It is convenient, both for the calculation and for some steps in analysis, to recast the minimization problem without constraints. That is accomplished merely by a linear change of variables; see [12], Eq.(3.1)ff. Define new constants γ k i such that Now the sum of the α k j is 1 for any choice of the γ k j , and the unconstrained minimization takes the form Correspondingly, the next iterate is Of course, with relaxation this becomes The relation of Anderson's method to Broyden's algorithm is discussed in Section V.

IV. RESULTS WITH ALS-U PARAMETERS
Parameters considered in the preliminary design report for ALS-U [16] of October 2020 are listed in Table 1. These parameters differ considerably from those adopted in references [1], [2], and [4], so part of the motivation for this report is to bring the study up to date.
Two choices for the harmonic cavity parameters are contemplated, called the high R/Q and low R/Q options, which are alleged to have different implications for stability issues. The stability is of course important, but does not concern us here.
At last notice the coupling coefficient β for the main rf cavity was still an undetermined feature of the design. The present ALS cavities might be used if their coupling could be increased enough to control the dc Robinson instability. For consistency with the calculations of [16] we take the "optimum" value of Table 1, β = 9.983. Appropriate values of impedance and quality factor for the calculation are the loaded values, R sL = R s /(1 + β) , Q L = Q/(1+β). We use the main cavity detuning from the table, which realizes the "compensated condition" given by Eqs. (3.79) and (3.80) in [16].
The table in [16] gives U 0 = 330 keV with insertion devices, but the reported calculations to be compared to ours have U 0 = 315 keV, so we choose the latter.
Similarly, where The denominator A i is a normalization factor, the discretized integral of the numerator. In the exponent U i (z j , ρ) is the potential well seen by the i-th bunch, and µ is the constant of Eq.(49) in [1]. In the following U i consists of the expression defined in Eqs. (51), (57), and (60) of [1], plus the integral of the short range wake potential convolved with ρ i , as follows: The wake potential W (z) , from detailed modeling by Dan Wang, is plotted in Fig.13 of [2]. It is zero for z < ζ 0 , where ζ 0 is a small fraction of the bunch length, arising from the non-zero length of the drive bunch in the wake field simulation.
The solution vector x of the previous section is identified with ρ and the map vector g is from the first term in (26); that is g = f + ρ.
The least-squares step in the Anderson algorithm is done with the code dgels from the Intel Math Kernel Library. This solves the normal equation using the QR decomposition, which is recommended in [12]. We take m k = min(k, ∞) = k so that at the k-th iterate the current evaluation and all previous evaluations of the map g are employed. About the same results are obtained with a sufficiently large limit on the number used, say with m k = min(k, 8), but this is bothersome to verify and gives no appreciable saving in computation time. The time for the least-squares step is negligible.
All results and CPU times are for a serial code in Fortran, running on a laptop. The code is arranged so that the result of any run can be taken as an initial guess for the next run.
A result for a complete fill can then be used to initiate a run with a partial fill, or with a smaller detuning, or with a new wake component included, or with smaller error tolerances, etc.
In contrast to the algorithms used in [1] and [2], no continuation in current from small initial values is needed to achieve convergence. In spite of strong nonlinearities convergence is found immediately at the design current and even at much higher values.
A. The case of high R/Q The first step is to consider the complete fill, with all 328 buckets filled with the same charge, and the entire wake coming from the HHC. Then every bunch comes out to have the same profile, even though that is not put in as a constraint. In Fig.1 we show that profile for a decreasing sequence of detunings. The legend gives the detuning δf in kHz and σ/σ 0 , the ratio of the rms bunch length to the natural bunch length of 3.9 mm. This plot agrees with Fig.(3.255) in [16]. With 201 mesh points per bunch, the CPU time is 6 seconds for The next step, again for a complete fill, is to see the effect of the main cavity beam loading, which is to be compensated by adjustment of the generator voltage. As expected, the compensation is essentially perfect and the bunch profile is the same to graphical accuracy.
It is given by the blue curve in Fig.2 for the nominal detuning of 317.8 kHz from Table 1.
Here the increase in bunch length is σ/σ 0 = 3.79 The compensation algorithm of [2] did not converge with the desired energy loss of U 0 = 315 keV. Noticing that it did converge in the work of [1] which had a smaller value of The compensation algorithm is the standard Gauss-Newton method for nonlinear least squares, although it was not recognized as such in [2]. There are other algorithms, such as the Levenberg-Marquardt method, which can be more robust concerning the starting guess.
Perhaps such a method could give a least squares solution directly for the desired U 0 , but not necessarily in a shorter time.
Turning on the short range wake we get the red curve in Fig.2. The short range force reduces the asymmetry of the bunch, and increases its rms length by 3%. This is different from the effect of the short range wake in the broad band resonator model [2], and perhaps more reasonable.
Additional bunch lengthening through a decrease in detuning is a possibility for the machine, discussed in [16]. The results of two smaller values are shown in Fig.3. The transition to overstretching, when two peaks appear, occurs between the two.
The partial fill anticipated for the machine, which has been called Fill C2 in [13], has distributed gaps of 4 buckets each, with a total of 284 bunches. There are 11 sub-trains, 9 with 27 bunches and 2 with 26, the latter two on opposite sides of the ring. All bunches have the same charge, chosen to give the desired average current of 500 mA. Taking this case with the nominal detuning δf = 317.8 kHz of Table 1, and including HHC, MC, and SR, we get the densities shown in Fig.4. There are 6 bunches in the plot out of a typical subtrain of 27 bunches. The one with maximum farthest to the right is nearest the front of the subtrain.
As was discovered in [2], the main cavity has a large influence when there are gaps in the train. The bunches near the front of the sub-train resemble that of the complete fill, whereas those at the middle and back are broader and flatter. The distributions of bunch length increase and centroid displacement along the full train are shown in Fig.5 and Fig.6.
The bunch length increase and the centroid displacement are both largest at the back of a sub-train.
The most interesting figure of merit is the increase in the Touscheck lifetime over the case without a harmonic cavity. This is plotted for a typical sub-train in Fig.7. The factor of increase, τ /τ 0 , is not far from the length increase σ/σ 0 . The strong variation along the sub-train should be an issue in determining the average beam lifetime, but that matter is beyond the scope of this work. Fig.4 is to be compared with Fig.(3.256) in [16], generated from a macro-particle simulation with rather severe noise. This plot is for a case different from ours in that it does not include the short range wake and most probably has a different account of the main cavity beam loading, which is mentioned but not described. Also, the detuning is not specified exactly but is said to be "right below the onset of the overstretching instability". Our in [16].

B. High R/Q with overstretching
What is the effect of overstretching with the partial fill, as compared to the result of Fig.4 for the complete fill? It turns out that the threshold for overstretching is at larger δf than This shows that there is no profit in overstretching beyond a certain point. Fig.12 displays more undesirable bunch distortion than 11 without much increase in the average bunch length.
It took only 40 seconds to produce Fig.9 starting with the solution of Fig.4, and another 40 seconds to make Fig.10 starting with Fig.9. Through Anderson acceleration we have gained a remarkable advance in technique compared to Ref. [2] in which these solutions could not be produced at all. The preliminary design report [16] expresses an interest in running this case with 20%  overstretching, which is illustrated with a macro-particle simulation in Fig.(3.257). We obtain the closely similar result of Fig.14 with a detuning of 140 kHz, reduced from 164.7 kHz.
The corresponding outcomes for bunch lengthening, centroid distribution, and Touschek lifetime increase are plotted in Fig.16, Fig.17, and Fig.18.

V. THE RELATION OF ANDERSON ACCELERATION TO BROYDEN'S METHOD
By an argument of Eyert [14], revisited by Fang and Saad [15], Anderson's method is equivalent to a generalized form of Broyden's second method for updating an approximation to the inverse Jacobian as in (13). The usual Broyden method imposes the secant condition, which is motivated by the linear approximation to f . A second condition requires that the Frobenius squared norm G k+1 − G k 2 F be minimum with respect to G k+1 , subject to condition (28). These two conditions lead uniquely to (13).
A generalized Broyden method imposes multi-secant conditions, G k ∆f p = ∆x p , p = k − m k , · · · , k − 1 , As an example, the parameter set for ALS-U in the Preliminary Design Report was adopted. Results similar to those obtained by macro-particle simulations in the report could be obtained with a reasonable choice of detuning parameters. The report does not specify detuning exactly, and the physical model is different in not including the short range wake, and may have a different treatment of the MC beam loading.
This paper introduces the Anderson iterative method that is probably new to accelerator physics, and which seems very promising for further applications in the field. It is especially interesting for problems falling under rubrics such as "self-consistency" or "phase space matching", often formulated in terms of nonlinear integral or differential equations.
There are large-scale applications of Anerson's method and related ideas in the literature of ab initio quantum mechanical calculations of material and molecular properties [5,14,15,18,19]. These are based on the Kohn-Sham density functional formalism [20], which is similar in spirit if not in specifics to systems arising from the nonlinear Vlasov equation. It may be profitable to keep an eye on this work to see if there are any lessons to be learned for accelerator physics. Also, ongoing efforts by numerical analysts are interesting, specifically for Anderson acceleration [21].