Multiobjective optimization of dynamic aperture

Dynamic aperture (DA) is one of the key nonlinear properties for a storage ring. Although there have been both analytical and numerical methods to ﬁnd the aperture, the reverse problem of how to optimize it is still a challenging problem. A general and ﬂexible way of optimizing the DA is highly demanded in accelerator design and operation. In this paper, we discuss the use of multiobjective optimization for DA. First we consider using objective functions based only on numerical tracking results. Data mining of these results demonstrated a correlation between DA and low-order nonlinear driving terms. Next we considered using objective functions which included both numerical tracking results and analytical estimates of low-order nonlinear driving terms. This resulted in faster convergence. The National Synchrotron Light Source II (NSLS-II) lattice was taken as an example to illustrate this method. This multiobjective approach is not limited by particular linear or nonlinear lattice settings, and can also be applied for optimizing other properties of a storage ring.


I. INTRODUCTION
Dynamic aperture (DA) is one of the key nonlinear properties of a storage ring lattice in both design and operation stages. It strongly connects to beam lifetime and injection efficiency. Without any exception, the new design or upgrade of a storage ring has to consider this quantitatively [1][2][3][4][5][6]. The DA is limited by the nonlinear forces on circulating beams. The forces can be systematic or random nonlinear fields due to sextupoles used for adjusting the chromaticity, octupoles for stabilizing the particles collective motion, wigglers and undulators in synchrotron light sources, or beam-beam interaction in colliders. There have been both analytical methods which estimate the DA size [7,8] and direct tracking methods for precise DA calculation [9,10], but the reverse problem, i.e., how to optimize it, is still challenging for accelerator physicists.
For the optimization of DA, previous work has been focused on correcting the multipole errors and minimizing the nonlinear driving terms [11][12][13], where usually partial realistic random errors are included. Detailed particle tracking which has the effects from random misalignment errors, multipole errors, insertion devices, and small apertures is then applied afterward to validate the optimization. Tools like frequency map analysis (FMA) can be used to examine the tune diffusion and resonance structures [14].
In this paper, we present multiobjective approaches to DA optimization. First, we consider using objective functions based only on numerical tracking results. Data mining of these results demonstrated a correlation between DA and low-order nonlinear driving terms. Next we considered using objective functions which included both numerical tracking results and analytical estimates of low-order nonlinear driving terms. This resulted in faster convergence. These approaches have been applied with success to the National Synchrotron Light Source II (NSLS-II) lattice [15] and similar methods and results are also recognized in works from the Advanced Photon Source [16]. The NSLS-II lattice was taken only as an example to illustrate this method and this paper does not present the final results of the NSLS-II DA optimization.
Section II is a short introduction to nonlinear driving terms. Section III introduces the conceptual basis for multiobjective optimizer, the objective functions, and constraints we are using for DA optimization. In Sec. IVA we discuss constraints that we impose in order to make the optimization robust. In Sec. IV B we outline two approaches that we have studied. The first considers only numerical tracking results and optimizes both the onmomentum and the off-momentum DA area. The second approach optimizes the on-and off-momentum DA and in addition analytic expressions for tune shift with amplitude. It is found that including the tune shifts in the optimizer improves convergence speed. Both approaches are illustrated using the NSLS-II lattice as an example. Results of the first approach are discussed in Sec. IV C. In Sec. IV D, we exhibit the correlation between DA and the low-order nonlinear driving terms, including the magnitude of tune shift with amplitude, by performing a data mining on the results of Sec. IV C. In Sec. IV E we illustrate the use of an optimizer using both numerical tracking data and tune shift with amplitude as objective functions. Conclusions are presented in Sec. V. The NSLS-II lattice properties are explained in more detail in Appendix A, and Appendix B gives more algorithm details on generating new children in the optimization iterations.

II. NONLINEAR DRIVING TERMS
The single-particle dynamics of a circulating beam are governed by the one turn map. In the normal form analysis, the invariant or effective Hamiltonian of the whole ring is normalized in resonance bases [17]. The nonlinear driving terms are those that can excite certain resonances in the Hamiltonian. Minimization of the driving terms has been used for improving the DA [13,18]. In this paper we call this method the analytical approach.
For a ring with n elements, we can normalize the one turn map M 1!n as [13] where R is a rotation, e :h: is the nonlinear Lie map, A 1 is a normalizing map, and subscripts 1 and n are the indices of ring magnets. For linear lattice without coupling, we can apply perturbation theory and approximate the effects of the normalizing map A i at the ith element as where x;i and x;i are the horizontal beta function and dispersion at the ith magnet, and is the momentum deviation. Similar results also apply for the y plane. In the resonance basis h AE x ffiffiffiffiffiffiffi 2J x p e AEi x ¼ x Ç ip x , R i!j is merely a rotation of the phase advance between the ith and the jth element, ðJ; Þ are action-angle variables, and we have where i!j;x is the phase advance of i ! j. Therefore the potentials Vðx; yÞ of a multipole magnetic field, e.g. Vðx; yÞ ¼ K 2 ðx 3 À 3xy 2 Þ=6 for a sextupole with strength K 2 , can also be expanded in the resonance bases h abcde : The nonlinear driving terms h can be grouped by their order n ¼ a þ b þ c þ d þ e and each h abcde is an explicit analytical function of magnet strengths, beta functions, and dispersions [13]. Each of the h abcde drives a certain quantity or resonance, and some of the low-order driving terms are shown in Table I. Each of these terms has a closed analytical expression. By minimizing the driving terms h abcde and tune shift with amplitude, the resonances and nonlinear effects get suppressed. Compared with the particle tracking for the DA, this analytical approach is indirect but requires less computing. Depending on the working point and magnet type, different lattices may have different combinations of strong resonances; therefore tools like frequency map or tune footprint are needed to identify them and a minimization routine is then applied to suppress related driving terms [9,13,18]. In the following sections, we introduce the multiobjective optimization which combines driving terms and direct tracking to optimize the DA.

III. MULTIOBJECTIVE OPTIMIZATION
In the case of DA optimization, the free parameters x are magnet strengths, objective functions f m are DA properties, and constraints can be DA shape, chromaticities, or nonlinear driving terms.
The solutions to Eq. (5) are not unique, but comprise a set, if the f i conflict with each other. Increasing one f i but decreasing the other will produce a new solution which is neither better nor worse than the original one. They are equally good but only with different trade-offs. We say that one solution dominates the other when it is better in at least one objective function f i but not worse in any other objective function. A solution that meets the constraints g j and h k dominates any solution which does not. The solutions to Eq. (5) form a set called Pareto optimal set, in which any solution is not dominated by any possible solution in the whole space [19].
We use a population based evolution algorithm to find the Pareto optimum iteratively [19][20][21]. Each iteration is called one generation. At the first step, a fixed number of candidates are initialized as the first generation, and in our case they are uniformly distributed in ½x L 1 ; x U 1 Â ½x L 2 ; x U 2 Â Á Á Á Â ½x L N ; x U N . Then one pair of them is randomly chosen as parents to generate two new children according to a certain probability density function (see Appendix B). This process is called crossover and is repeated until the population is doubled. The third step is called mutation, where new children are perturbed slightly (see Appendix B). The objective functions f i ðxÞ, constraints g j ðxÞ, and h k ðxÞ are evaluated for each of these new children. The whole population, including the parents, is then sorted or ranked according to their dominance relations. Candidates not dominated by anyone are in the first rank. The second rank candidates are only dominated by the rank-one candidates, and the third ranks are only dominated by the first and second ranks, etc. The final step is a population control process, where only half of the better candidates are kept. This is done by dropping candidates with larger rank. Within same rank, candidates in a high population density region have lower priority to be kept. Up to this point, the population is kept the same but the overall qualities in terms of objective functions and constraints are evolved not worse than the previous population before crossover. This is called one generation or one iteration. The population is evolved generation by generation until it converges or reaches the maximum number of iterations. Figure 1 shows the last generation of objective functions, where we are maximizing both the on-momentum DA (horizontal axis) and the off-momentum DA (vertical axis). The trend of the optimization is that the whole population moves toward upper right, and converges to lower rank. By this generation, the Pareto optimal has only three candidates at the upper right corner. In a realistic optimization, the full convergence may take too long to achieve, while stopping at a set of reasonably good solutions which are sufficient for further applications is desirable. In Fig. 1, although the solutions are not fully converged, they have many candidates with DA >120 mm 2 . It is a good starting point for detailed simulations that include various magnet errors and misalignment.
We are all familiar with the method of minimizing a weighted sum fðxÞ ¼ 1 f 2 1 ½ðxÞ þ 2 f 2 2 ½ðxÞ þ Á Á Á þ M f 2 M ½ðxÞ, when dealing with multiobjective functions. In this case, solution x depends on the weight i , i ¼ 1; 2; . . . ; M, and the true solutions, i.e., the Pareto optimal set, need all combinations of weight i . Unlike the weighted sum minimization, the multiobjective optimizer searches for an optimal set of solutions and leaves the choice of weight to the decision maker. The choice is made upon the global view of trade-off between optimal solutions instead of initial guesses of relative importance of each objective function. In this method, the sacrifice and gain are more clearly illustrated.

IV. DA OPTIMIZATION ON NSLS-II LATTICE
A. Robust dynamic aperture tracking and quantification DA is usually defined as the maximum stable area in transverse plane at injection point. Particles with initial condition within this area will survive after a certain number of turns of tracking.
Obviously, the area of this 2D bounded region alone cannot represent the quality of DA. For different momentum deviation ¼ ðp À p 0 Þ=p 0 , the stable area or DA may be different. Larger on-momentum DA may help the injection, while off-momentum DA helps Touschek lifetime.
One common way to calculate DA is particle tracking along several radial lines (see Fig. 2) with fourth-order or even higher symplectic integrator [22]. An ideal solution of DA would have an elliptical type of shape with no cut-in unstable area (as the two red points shown in Fig. 2). Because of betatron oscillation, particles are crossing the transverse plan at different ðx; yÞ coordinate, and after long enough time, they will be lost at the cut-in unstable area. A good searching method should have good precision to detect these cut-in unstable areas while not requiring too much trial tracking near the DA boundary, since the tracking is expensive in computing time. Based on this, we introduce a look-back feature in our tracking code. Along each searching line, if the current point failed to survive, we go inward one step, and continue the tracking. This robustness may not be necessary for a code running only once with human interpretation, but is critical for an automatic optimizer which is less tolerant.
In maximizing the DA area, we put a constraint for the DA boundary requiring that it should cover an ellipse defined by two half-axes lengths A x and A y . On each radial line, the boundary stable point ðx; yÞ has x 2 =A 2 x þ y 2 =A 2 y ! 1, i.e., outside of the ellipse. This makes the maximization of DA area meaningful. The semiaxes A x and A y can be different for on-momentum and offmomentum DA. This flexibility does not increase any difficulty in our optimization method.

B. Optimization strategies
We shall discuss two different strategies to optimize DA. The first strategy uses the set of objective functions f 1 and f 2 characterizing the DA area of on-momentum and off-momentum particles: where SðÞ is the DA area of momentum deviation . As shown in Fig. 3, the DA is tracked by seven radial lines for momentum deviation ¼ À2:5%, 0, and 2.5%. A constraint ellipse with half axes A x ¼ 18 mm and A y ¼ 2 mm is set up to illustrate that the DA should cover this ellipse fully to prevent cut-in resonance structures. The second strategy uses as objective functions f 1 the sum of the on-and off-momentum DA and f 2 the geometric sum of tune shifts with amplitude: where Sð; y ¼ 1mÞ is the DA area at certain momentum deviation and vertical offset y ¼ 1m. In Fig. 4, the top part shows the DA in the x-y plane for 13 equally spaced in ½À3%; 3%, and the bottom figure only shows the radial lines close to x axis but plotted in the -x plane with vertical offset y % 1 m and 1 mm. The DA in the -x plane decreases slowly with increased vertical offset. In this approach to DA optimization, an extra constraint shown as a green dashed line in Fig. 4 is applied. A preferred solution would have stable area lie outside the rectangle in the -x plane. However, in this particular solution, the constraint is not fully met near ¼ À2:5%. The DA is tracked by a fourth-order symplectic integrator. To increase accuracy, we use the exact solution for rectangular dipole [23]. This makes it unnecessary to slice a magnet into hundreds of pieces and has significant improvement on tracking speed. The damping wiggler is modeled by a kick-map with aperture size ½À30 mm; 30 mm Â ½À5 mm; 5 mm. As shown in Figs. 2 and 3, the searching is along several radial lines on the upper x-y plane. A small offset is introduced for the radial lines close to x axis, since zero y for the sextupole potential V ¼ k 2 ðx 3 À 3xy 2 Þ=6 will not have vertical kick (this is not necessary when the lattice has misalignment errors). Several points on each line are tracked simultaneously, and the most inward survival will be taken as the starting point for the next iteration. The look-back feature will pick a more inward point and track points between starting point and the first lost point along the same radial line. The searching stops when the grid is fine enough. In the optimization iteration, where speed is very important, we track only 64-128 turns. From our experience this is good enough for the overall size of DA, but not the detailed resonance structures. Frequency map analysis (FMA) comes as a filter to hand-pick the good candidates, where 2048 turns of tracking is done.
In the following sections, we will illustrate two different approaches by using the NSLS-II lattice (see Appendix A for the NSLS-II linear lattice) as an example. The first case is optimizing DA with the objective functions of Eq. (6) by varying six geometric sextupoles in straight sections. This linear lattice has tunes (33.43, 16.35) and chromaticities (1.5, 0.1) (see Sec. IV C). The second case uses the objective functions of Eq. (7) and has more independent variables (see Sec. IV E).

C. DA Optimization based solely on numerical tracking
In this example, we optimize the DA of a lattice with ( x ¼ 33:43, y ¼ 16:35). The bare lattice has 15 superperiods with high-low-beta DBA cells (and long-short straight sections, see Appendix A, Fig. 13). The chromaticity is adjusted from natural value ( x ¼ À103, y ¼ À40) to (1.5, 0.1) by three chromatic sextupoles between dipoles in each DBA cell. The other six geometric sextupoles labeled as SH1, SH3, SH4 in the high-beta section and SL1, SL2, SL3 in the low-beta section are used to optimize the DA. Three damping wigglers modeled by kickmap are included in the 2nd, 7th, and 12th long straight.
For the two objective functions of Eq. (6), i.e., DA of onmomentum and off-momentum particles, we use a population of 6000, and run for 300 generations. With 96 xeon 2.33 GHz CPUs in a Sun Grid Engine cluster , it takes less than a week to get a final population shown in Fig. 1. During the optimization, random errors such as multipole errors, 30 m misalignment, and 5 rad rotation errors of sextupoles are included. These errors are also introduced to all the quadrupoles at a smaller level to produce 20 m-40 m orbit distortion and x-y coupling.
In the optimization, when evaluating new children, the misalignment and rotation errors are changing from candidate to candidate, and each candidate does not use the same set of errors. Compared with optimization on a fixed set of random errors, this could reduce the speed of convergence; however, we benefit from the robustness of the solutions which are more independent of the choice of random seeds.
A postprocessing script is applied as a filter to pick the better solutions after more turns of tracking and averaging DA over more random seeds. As an example shown in , where d x =dt ¼ ð x;1025!2048 À x;1!1024 Þ=1024 and d y =dt ¼ ð y;1025!2048 À y;1!1024 Þ=1024 are tunes drift per turn after 1024 turns. For high precision of frequency drift, e.g. 10 À7 , Numerical Analysis of Fundamental Frequencies is used on 1024 turns of orbit [24].
The tune footprint is shown in Fig. 7. A subset with x 2 ½À15 mm; 15 mm and 2 ½À2:5%; 2:5%, i.e., the points inside the green dashed rectangle in Fig. 6, are plotted in bigger blue points.

D. Correlation of DA and driving terms
Intuitively, we expect there is a correlation between DA and driving terms, i.e., the direct tracking and analytical approach. This is the reason that nonlinear driving terms can be used for DA optimization [13,18]. In this section, we investigate their correlation.
We performed a data mining on the results of Sec. IV C, where the optimization was done by direct particle tracking using constraints of Eq. (6) where no driving terms were calculated or used as a guide for the optimizer. For one of the early generations, we calculated the nonlinear driving terms up to the 2nd order, 33 terms in total and 24 terms after removing the degeneracy, for all candidates. To bring different terms to a same amplitude scale, a normalization is done within the whole generation for each driving term: where N p is the number of individuals in all generations, h abcde the nonlinear driving terms. Their sum sðhÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P ðh n abcde Þ 2 q is plotted against the DA area in Fig. 8. As shown in Fig. 8, there is a strong correlation between DA area and the driving terms h abcde . Similar correlations are observed also for sðhÞ versus off-momentum DA area. We found that the optimal candidates always have small sðhÞ, while a small sðhÞ does not always imply a good DA. This leads to a conclusion that, in the optimization process, small driving terms is a necessary but not sufficient condition for large DA. In Fig. 8, for many cases with DA area less than 40 mm 3 , sðhÞ % 5 does not always give the largest DA area; while for solutions near 140 mm 2 , all candidates have small sðhÞ % 5. This correlation can be used in the optimization as one of the objective functions or constraints. Depending on the lattice, contributions from each driving term varies. For example, the chromaticity related driving terms have no correlation to the DA, since this optimization is done with fixed chromaticity. In Sec. IV E, we set the constraints only on the most significant terms, i.e., tune shifts with amplitude, to be smaller than certain level. The exact value varies with lattice and we just estimate it from the first few generations. We have found that by using this correlation, the speed of convergence can be increased.

E. DA optimization based on numerical tracking and analytical tune shift with amplitude
Since many of the 3rd generation light source are operating at positive chromaticity to suppress the collective effects, we would like to optimize a nonlinear lattice of chromaticity ðþ5; þ5Þ. The multiobjective evolution algorithm provides the flexibility to treat problems with many independent variables and constraints.
In this section we use the constraint functions of Eq. (7), i.e., including tune shift with amplitude, to optimize the DA in the xplane with y ¼ 1 m. The particle tracking is done on a regular grid. All nine sextupoles are used as parameters, six for geometric optimization, two for fixing linear chromaticity at (5, 5), and one for varying the nonlinear horizontal chromaticity. Two groups of quadrupoles, in long and short straight sections respectively, are also used as parameters to adjust tunes. Each group has three quadrupoles, two for boundary condition matching and one for varying tunes. In this case, 15 parameters are tuned and among them nine are free parameters. q , up to the second order. Small driving terms is a necessary but not sufficient condition for large DA area. Similar correlations are observed for off-momentum DA.
With two free quadrupoles and four matching quadrupoles, the tunes are varied within a certain range in the optimization (see Fig. 9 for the footprint of tune variation). This method varies the tunes by adjusting the beta functions only in the straight sections (not including the three damping wiggler sections). Therefore, the dispersion and emittance are not varied in our optimization.
In each DBA cell, i.e., half of the mirror symmetric superperiod, there are three quadrupoles in high-beta straight section and low-beta straight section respectively (see Fig. 13). By varying QH1, QH2, and QH3, we make the beta functions x and y change only locally between QH3 and its mirror symmetric one on the left, i.e., within long straight sections. Two variables, QH1 and QH2, are used for the boundary conditions of x ¼ y ¼ 0 at the center of long straight section. The third quadrupole is a free variable that can vary the tunes. The short straight section can also do similar matching with QL1, QL2, and QL3 to have a second free variable for tunes. The matching can be done for each one of the 30 straight sections independently but the three long straight sections with damping wigglers are excluded in our optimization intentionally. As one matching example, the overall changes of beta functions are shown in Fig. 10 for 1=3 of the ring. It shows the relative lattice functions change before and after the matching. After the matching, the tunes vary by ð x ; y Þ ¼ ð0:016; À0:042Þ from (33. 15, 16.27). When matching long and short straight sections independently, the full range of tune variation is shown in Fig. 9. This plot compares the coverage of tune variation from the initial population, final population, and optimal candidates. Because of the random errors involved, the ranking of candidates are random-seed dependent. These dependencies could be reduced by a postprocessing analysis.
In the postprocessing, larger errors and longer tracking runs are applied while the rule for selecting best ones are same as the objective functions in the optimizer. After the postprocessing, we found the example shown in Fig. 11 where the working point is (33. 24, 16.36). This new working point guarantees that at chromaticity (5, 5) the particles with ¼ À0:03 stay far enough from the integer tune x % 33 and survive in the tracking. The FMA of this same candidate is shown in Fig. 11 and the surviving turns for particle tracking is shown in Fig. 12. In the survival plot, classical radiation damping and rf acceleration are included. The rectangle in the green dashed line in Fig. 11 is the constraint area that a candidate should cover (see Sec. IV B), however it is not fully met by this particular candidate.
By varying quadrupoles as well as sextupoles, we see that the multiobjective optimization provides an efficient method for exploring potential working points.

V. CONCLUSION
Taking the NSLS-II lattice as an example, we have shown that multiobjective optimization is an effective way of optimizing nonlinear properties, mainly DA area for both on-momentum and off-momentum particles. Two different examples are discussed for different tunes, chromaticities and adjustable magnets. The first case only uses geometric sextupoles at fixed tunes and chromaticities and objective functions of Eq. (6) based only on numerical tracking data. The data mining on this case shows strong correlations of DA area with normalized nonlinear driving terms. We observed that small low-order nonlinear driving terms are a necessary but not sufficient condition for a large DA area. The second case combines DA direct searching and nonlinear driving terms minimization. It varies both nonlinear chromaticities and tunes in the iterations of optimization. By checking with FMA, the results are proved to be satisfactory and the approaches we have been using are found to be effective and robust. Our conclusion is that multiobjective optimization provides a complementary approach which can be used either together with driving terms optimization or independently using only tracking data.
The approaches we have investigated have been illustrated using the NSLS-II lattice, but the applications are not limited to NSLS-II. Similar optimization techniques can be applied to other nonlinear properties in accelerator physics.

ACKNOWLEDGMENTS
We thank J. Bengtsson  NSLS-II lattice has 30 double bend achromatic (DBA) cells. Two DBA cells with mirror symmetry inside have low-and high-beta functions at short and long straight sections (see Fig. 13). The whole bare lattice has 15-fold symmetry. Each DBA cell has nine sextupoles, three sitting in between the dipoles have nonzero dispersion, and are used for chromaticity adjustment. The other six, three on each side, are at the straight section where linear dispersion is zero. Three damping wigglers are included in the 2nd, 7th, and 12th long straight sections. The threefold linear lattice is matched to be approximately 15-fold symmetric.

APPENDIX B: CROSSOVER AND MUTATION
Crossover is an operation applied to two parent candidates to generate two children. These two parents are randomly chosen from the survivals of the last generation. In the x-y plane ¼ 0 and in the -x plane y ¼ 1 um. Realistic misalignment errors, apertures, classical radiation, and rf cavity are included.
FIG. 13. Lattice functions and magnets layout of one DBA cell. It is half of the mirror symmetric superperiod. Each DBA cell has nine sextupoles in total, three for chromaticity adjustment and six for nonlinear lattice optimization.
In our case, we use simulated binary crossover method where the children are generated following a polynomial probability density function (PDF) and every candidate is chosen as a parent once and only once [19]. It is intuitive to have children generated with higher probability around the parents and lower when further away from the parents, i.e., the PDF peaks at parents. The polynomial form will reduce the complexity in simulation when we use inversion method to generate the random children based on a uniform distribution, since the polynomial form is invertible. This PDF is controlled by a single parameter c which is the order of the polynomial PDF [see Fig. 14(a)]: where x is the independent variable, x U and x L are the range of x, p 1 and p 1 two parents, and ¼ 2 À ð1 þ 2ðp 1 À x L Þ=ðp 2 À p 1 ÞÞ À c À1 the normalization factor. For simplicity we assume x L p 1 p 2 x U . The procedures of generating can be found in [25] [in Eq (A.4), we have correct ¼ ðx À x l Þ=ðx u À x l Þ for u 0:5 and ¼ ðx u À xÞ=ðx u À x l Þ otherwise]. The whole population is doubled after the crossover operation, but later the sorting process will find out the best half and keep them as the next generation.
After the crossover, mutation operations are followed to bring perturbations into the new children. Following similar method in crossover, we use polynomial PDF peaked at the new child. The perturbed value has lower probability when they are further away from the current value [see Fig. 14(b), one of the children from crossover is now the parent in the mutation]. The shape of PDF can also be controlled by a parameter m .
Both the crossover and mutation can be done within some probability. When outside of this probability, in crossover we simply copy the parents to children and in mutation we do nothing.