Fast symplectic map tracking for the CERN Large Hadron Collider

Tracking simulations remain the essential tool for evaluating how multipolar imperfections in ring magnets restrict the domain of stable phase-space motion. In the Large Hadron Collider (LHC) at CERN, particles circulate at the injection energy, when multipole errors are most signiﬁcant, for more than 10 7 turns, but systematic tracking studies are limited to a small fraction of this total time—even on modern computers. A considerable speedup is expected by replacing element-by-element tracking with the use of a symplectiﬁed one-turn map. We have applied this method to the realistic LHC lattice, version 6, and report here our results for various map orders, with special emphasis on precision and speed.


I. INTRODUCTION
Since 1982, when maps were first introduced to the accelerator community [1], the question has been, ''Can they be used to replace element-by-element tracking?'' It became clear that in the presence of strong multipolar fields, maps can be used only if one applies some kind of symplectification scheme [2]. One such scheme was proposed by Irwin [3] in 1989 and later tested for the CERN Large Hadron Collider (LHC) [4]. Unfortunately, it was found to be (i) only a factor of 2 faster and (ii) insufficiently precise for determining the dynamic aperture. The application of this technique was therefore discontinued, until recently, when a significant improvement was proposed [5,6]. This improved scheme produces a symplectic map, called a Cremona map, that is guaranteed to agree with element-by-element tracking through a given Taylor order with minimal additional terms.
Here we apply Cremona map tracking to a realistic model of the LHC to see if this new technique can yield a sufficiently precise determination of the dynamic aperture, but with a tenfold gain in speed compared to direct tracking. To this end we study 60 different configurations (called ''seeds'' in the following) of randomly distributed multipole errors.
A sketch of the underlying theory, with references to more detailed literature, appears in Sec. II. Then, in Secs. III and IV, we describe the performance and optimization of the code CTRACK and the production of the required Cremona maps. The results obtained for a model of the current LHC optics (now under construction at CERN) are reported in Sec. V. We conclude the paper with a brief summary, Sec. VI.

A. The symplectic condition
The study of dynamical systems leads naturally to the concept of transfer maps M, which describe a system's change in phase space from initial conditions z i to final conditions z f during a given fixed interval of time (or the timelike variable): Mz i z f . For Hamiltonian systems, transfer maps obey the very special symplectic condition. In a 2n-dimensional phase space, with points z z 1 z 2n q 1 q n ; p 1 p n , where q i and p i are the particle's positions and canonical momenta, this condition reads [1,7,8]M MJM J: (1) Here M denotes the Jacobian matrix, which has entries M ab @z f a =@z i b ;M M denotes the transpose of M; and J denotes the fundamental 2n 2n symplectic matrix 0 ÿI I 0 , where I is the n n identity matrix. If throughout phase space M obeys Eq. (1), then M is called a symplectic map.
For most systems of interest we cannot write the transfer map in closed form; hence we usually resort to the use of approximations -often in the form of a Taylor map: each phase-space variable in the final state is expressed as a (truncated) Taylor series expanded in terms of the initial conditions. When applied to the Taylor series representation of a map, the symplectic condition Eq. (1) comprises a set of nonlinear relations between Taylor series coefficients of different orders. Hence the unavoidable truncation of the Taylor series (almost always) violates the symplectic condition. For some purposes this (one hopes small) violation will have little effect; but for others, e.g., long-term tracking using many iterations of the map, the consequences can be severe. Moreover, numerical studies have shown that these undesirable consequences often do not derive from the loss of high-order information in the truncated Taylor map but from the symplectic violation itself [9].

B. The factorization theorem
Dragt and Finn have shown that one can cast the Taylor map derived from a Hamiltonian system into a form that maintains its symplectic nature. To do this they use the Poisson bracket ; to define the Lie operator :f: associated with any function f: for any function g. And the corresponding Lie transformation is the operator (Here :f: 2 g f; f; g, and similarly for higher powers.) These operators have the very special property that for any function f, the Lie transformation e :f: denotes a symplectic map [1,10,11]. The Dragt-Finn factorization theorem shows that for any symplectic map M which has a convergent Taylor series representation, one can use an order-by-order procedure to convert that representation to the form M e :f 1 : e :f 2 : e :f 3 : e :f 4 : ; (4) where each f k denotes a homogeneous polynomial of degree k in the initial phase-space variables z i [7,10]. The f k are called the Lie generators of the map M, and the Taylor series terms through order k ÿ 1 determine uniquely the Lie generators f 1 ; . . . ; f k . Among the virtues of this representation are that one can relate the coefficients of the f k to the various optical properties of the dynamical system; each factor corresponds to successively higher-order information about the dynamical system; and truncating the product does not violate the symplectic condition Eq. (1). Hence the factorization theorem gives us a method for symplectifying Taylor maps. The one drawback of the representation Eq. (4) is the time required to compute its action. The first two factors pose no difficulty: e :f 1 : generates the constant terms in the Taylor map and e :f 2 : generates the linear terms (the matrix part). It is the remaining, nonlinear factors N e :f 3 : e :f 4 : that present a problem: each e :f k : leads to Taylor map terms of degrees k ÿ 1 and higher. A straightforward use of Eq. (3) can be time-consuming to compute to machine precision, and truncating that power series again (usually) violates the symplectic condition. See [6] for a more complete discussion. One approach to addressing the difficulty just described uses the concepts of Cremona maps, the jolt representation, and jolt decomposition.

C. Cremona maps
A Cremona map is a map that is both symplectic and polynomial. The set of all such maps obviously includes all linear symplectic maps, but it also includes many nonlinear maps: Consider a Lie generator g that depends only on the coordinates q. Then :gq: 2 z a 0 for all a, and e :gq: z a z a :g:z a z a g; z a (6) is a polynomial. Hence the Lie transformation of any polynomial in the coordinates alone is a Cremona map. As these affect only the momenta p, they are known as kick maps. Now let L denote any linear symplectic map. One can show that :Lgq: 2 z a also vanishes for all a, and hence any map of the form exp:Lgq: is also a Cremona map. These are called jolt maps [5,6].

D. The jolt representation
The jolt representation of a nonlinear symplectic map seeks to approximate the truncated version of Eq.
in such a way that the Taylor expansions of N P and J P agree through terms of order P. (The number of jolts N will, of course, depend on P.) If one starts from a Taylor map of order P and extracts N P using the factorization theorem, then J P constitutes an easy-to-evaluate symplectification of the nonlinear part of the original Taylor map. The particular form of the polynomial generators g j is described in Sec. II F. Here we simply note that they are not homogeneous: each contains terms of degrees 3 through P 1, and each factor in Eq. (8) is therefore a nonlinear polynomial (symplectic) map of order P. The polynomial expansion of Eq. (8) will, of course, contain many higher-order terms, terms that serve to symplectify the original Taylor map. Assuming Taylor map terms of degrees P 1 and higher do not contribute to the dynamics, then those high-order terms in the polynomial expansion of Eq. (8) ought to be small. Since those highorder terms involve products of various coefficients in the generators g j , we ask that the coefficients of the g j 's be as small as possible.
Irwin first proposed the representation Eq. (8) in 1989 [3], using L j 's that perform (uncoupled) rotations in the PRST-AB 6 FAST SYMPLECTIC MAP TRACKING FOR THE CERN . . .

(2003)
064001-2 064001-2 transverse planes, but numerical tests yielded poor results [12]. Later research has shown the results were better when selecting the L j 's from a broader class of linear symplectic maps. Moreover, that research showed that the quality of the representation Eq. (8) depends very sensitively on the choice of L j 's, and showed how to recognize and construct optimal sets of L j 's [5,6]. The L j 's so constructed are the ones used in the work reported here.

Notation
We use a special normalization for monomials of degree ' in z: where r 1 r 2n '. On the space spanned by these monomials, the USp2n inner product h ; i is defined to act according to the rule [5,7] hG ' for any scalars and . In addition to the general monomials G ' r , we define the q monomials where k 1 k n '. In terms of these monomials, we define two further quantities: (1) The sensitivity vectors r with elements Here the superscript r labels the different vectors, and the subscript jk is treated as a single index labeling the vector elements.
(2) The Gram matrix ÿ' with elements Here denotes the number of monomials of degree ' in n variables, and the w j denote a set of N weights associated with the N linear symplectic maps L j . (Equally weighted L j 's of course have w j 1=N.)

Decomposition
The conversion of a symplectic map from the factored product representation Eq. (7) to the jolt representation Eq. (8) relies on the solution to the following problem: Given a homogeneous polynomial of degree ' in z, and a set of N linear symplectic maps L 1 ; . . . ; L N , with associated weights w 1 ; . . . ; w N , find the smallest possible jolt coefficients a ' jk such that The solution, derived elsewhere [5,6], is The above way of writing h ' is called a jolt decomposition because each term in Eq. (16a) can act as the Lie generator of a jolt map.

Special case
When one variable is a constant of the motion, the jolt decomposition can be done in 1 less degree of freedom. Suppose, for example, that p n is the given constant of the motion; then its conjugate variable q n will not appear in the Lie generators, and hence the polynomials h ' have the form Note that the general monomials G 'ÿ r in this expression have the form Eq. (9), but in n ÿ 1 degrees of freedom. Consider the coefficient of p n : A jolt decomposition of this polynomial will be a sum of terms of the form L j Q 'ÿ k in n ÿ 1 degrees of freedom. Now observe that L j Q 'ÿ k p n can also act as the Lie generator of a jolt map. It follows that a jolt decomposition of h '; yields directly a jolt decomposition of h ' . Since jolt decomposition is much easier in fewer degrees of freedom, we gain a considerable computational savings whenever we can use this technique.

F. Conversion to Cremona map
The process for converting a map from the factored product representation Eq. (7) to the jolt representation Eq. (8) has been derived elsewhere [3,6]. Here we simply indicate the algorithm: (1) For the lowest Lie order, 3, set h 3 f 3 , and jolt decompose h 3 using Eqs. (16a) and (16b): PRST-AB 6 DAN T. ABELL, ERIC MCINTOSH, AND FRANK SCHMIDT 064001 (2003) 064001-3 064001-3 Then write J 2 exp:L 1 g 1 3: exp:L N g N 3:: (2) Now suppose one has computed the jolt representation through Lie order m to obtain J mÿ1 exp:L 1 g 1 : exp:L N g N :: (3) To go to the next order, expand J mÿ1 as a Taylor series and then refactor it as and write J m exp:L 1 g 1 : exp:L N g N :: Steps (3) - (5) can be repeated for successively higher orders until one obtains J P .

G. Numerical evaluation
For rapid evaluation of the Cremona map J P , we make use of a slight rewriting of Eq. (8) We may therefore rewrite Eq. (8) in the form where N all denote linear symplectic transformations. This rewriting of Eq. (8) says we may compute the action of J P as a sequence of matrices and kicks. Moreover, evaluating a given kick, e :g j : , requires, according to Eqs. (2) and (6), simply calculating the associated gradient of g j with respect to its variables, and this can be computed as the product of a matrix of coefficients (determined and stored in advance) with a corresponding vector of monomials. Fast numerical evaluation of the Cremona map J P therefore requires efficient algorithms for just two essential routines: (1) multiplying a vector by a matrix, and (2) computing a vector of monomials (in two or three variables for our case and through the Taylor order of interest).
Before leaving this section, we note that the procedure described in Sec. II F is not actually applied to the map N P of Eq. (7) but to a linearly normalized version of N P . In other words, the similarity transformation used to bring the linear part of M to normal form is applied also to N P , thus putting the coordinates and momenta on an equal footing. The linear maps thus ''removed'' from N P -along with the exp:f 1 : exp:f 2 : from Eq. (4)are then folded in with the maps A 0 and A N of Eq. (17) to build our Cremona approximation to the complete map M.

III. CREMONA MAP PRODUCTION
The LHC optics database, version 6, comprises a design (ideal) lattice together with 60 different seeds representing the range of imperfections expected to exist in the real machine. To build Cremona maps for each of these rings, we first used the available SIXTRACK [13] tools to construct tenth-order Taylor maps representing one turn around each of these 60 rings. Then the code CREMONA was used to construct the corresponding Cremona maps of (Taylor) orders 2, 4, 6, 8, and 10. In the six-dimensional case (with the rf cavity on) the map constructed carried particles around the ring from just after the rf cavity to just before the cavity. The rf cavity was then treated as an exact kick. The virtue of this approach was that the ''oneturn'' map did not change the momentum deviation p ÿ p o =p o , and we could therefore use the technique described in Sec. II E 3 to greatly simplify the Cremona symplectification.

IV. CTRACK PERFORMANCE AND OPTIMIZATION
The implementation of Cremona map tracking requires three principal tools: (1) a means for constructing the Taylor map for a given machine lattice; (2) a routine for  [14]; and tool 3 was implemented in the program CTRACK. Because tracking consumes so much more time than map production (see Fig. 1), only the second of the two new tools, CTRACK, was subjected to intensive optimization, as described in this section.
Code development and optimization were initially carried out on a Digital Equipment Corporation (DEC) computer with an Alpha EV5 processor. Performance and portability tests were then run on an IBM Power PC, and on a Linux Pentium III PC with both the GNU and Portland Group compilers. The intention was to find the best overall source code, hopefully the same for each platform and compiler, and not to compare systems with significantly different price and performance. Once the best compiler switches had been chosen, an executiontime analysis was performed. The results for a standard test case, ten particles for 50 000 turns in 6D mode, are shown in Table I. Based on our observations at the end of Sec. II G, it should come as no surprise that CTRACK spends almost all its time ( 90 % or more) in the subroutines computing the dot product of a matrix with a vector and evaluating the required vector of monomials in the routines named MdotV and evalMonoms, respectively; see Table I.
In code where a loop has a fixed, and not too large, number of iterations, it sometimes pays to ''unroll'' the loop. This technique involves writing explicitly each iteration of the loop, thus eliminating the overhead. For example, the (trivial) code do i=1,2 a(i)=b(i)+c(i) enddo would become simply a(1)=b(1)+c(1) a(2)=b(2)+c (2) eliminating the time spent initializing, incrementing, and testing the loop index. Even on systems with vector or pipelined architecture, this can still be worthwhile, especially for small numbers of loop iterations.
Since the matrices and vectors have fixed dimensions, typically 6 or smaller, all the loops in MdotV and evalMonoms were unrolled in the new routines mydotu and myMonoms. In a further version of MdotV for longer vectors, called mydotv, the innermost loop was not unrolled in order to make use of pipelining.
Finally, the new, C-optimized, subroutines were translated to FORTRAN. (The bulk of the code remained in C.) The results in Table II show a further improvement in   Table III the most time-consuming procedure is now the matrix by vector product, mydotv, with three rows and 165 columns in the standard test case (6D, Taylor order 8, 32 jolts=turn, for 5 10 5 turns). The procedure is called 16 000 000 times to perform 3 165 multiplies and adds, using about 67 s on the DEC system, giving a respectable 236 mega-floating point operations per second on a 350MHz processor. Figure 1 shows our final measure of performance: a comparison of the total computer time used in tracking with Cremona maps (as optimized above) against the time required by direct tracking. To make realistic tracking speed comparisons between direct and map tracking, one must include the time required to produce the necessary maps; but this time becomes relatively less important as one tracks for more and more turns. Since our typical dynamic aperture study for one LHC lattice involved tracking particles at four different amplitudes and five initial angles, each 6 10 6 times around the ring-a total of 120 10 6 turns-we used this total amount of tracking as our basis for comparison.
The principal results of our timing study are given in Fig. 1  The overhead cost of invoking CTRACK was not optimized-indeed CTRACK was invoked for each of the twenty particles tracked-and it constitutes the principal difference between gain and net gain. One could easily improve this behavior by tracking particles simultaneously.

V. RESULTS AND ANALYSIS FOR THE LHC OPTICS, VERSION 6
The LHC is being constructed in the tunnel of the former Large Electron-Positron Collider and will accelerate protons (bidirectional) to 7 TeV using superconducting 2-in-1 dipole and quadrupole magnets for bending and focusing purposes, respectively. The LHC parameters relevant for these studies are shown in Table IV.
To assess the utility of Cremona maps for long-term tracking studies of the LHC, one might compare phasespace portraits generated by direct (element-by-element) and by Cremona map tracking. Figure 2 shows just such a comparison of horizontal phase-space portraits at a very large amplitude (10) for several different map orders. (In what follows the order of a map will always mean the corresponding Taylor order.) This figure shows only very subtle differences in the phase-space portraits and suggests that for our present purpose we must use more detailed and more quantitative measures to assess the utility of Cremona maps.

A. Quantitative measures
For more quantitative comparisons between Cremona map tracking and direct tracking, we looked at the following measures: (1) differences in the amplitudedependent tune, or detuning error; (2) differences in the predicted dynamic aperture (DA); (3) one-turn tracking errors; and (4) multiturn phase errors. We describe each of these in turn.
From particle tracking data one can determine the tunes of a given particle. Figure 3 shows the results of just such an analysis on particles launched at five different angles (in x-y space) and at ten different amplitudes (from 1 to 10), and followed using direct tracking around one particular LHC lattice (seed 39). Using symbols, we say, for example, that each point of the left-hand plot in Fig. 3 represents a value for the horizontal tune Q h A; ; r; X of a particle launched with amplitude A and angle , and tracked around the LHC lattice, seed r, using direct tracking, indicated here by the ''exact'' map X. Figure 4 shows how the results based on Cremona map tracking differ from those based on direct tracking. In, say, the upper plot of that figure, each point represents an average -over the 60 different seeds r and the five different angles -of the absolute error jQ h A; ; r; C n ÿ Q h A; ; r; Xj; where C n denotes the Cremona map of (Taylor) order n. The lower plot shows corresponding results for the vertical plane. The error bars on the points indicate 1 standard deviation above the mean values, and the dashed lines indicate the maximum values.
We define the DA for a given lattice as the largest amplitude below which all simulated particles remain in the accelerator for the duration of the simulation [15]. Figure 5 compares DAs computed using direct tracking with those computed using Cremona map tracking at orders 4, 6, 8, and 10. Each plot shows, for a given order and a given angle (in the x-y plane), the DAs computed by direct tracking for each of the 60 realistic LHC lattices (red curves), and the corresponding DAs computed by Cremona map tracking (blue curves). The green hatching serves simply to highlight the differences between the two curves.
The two measures described so far-detuning errors and differences in predicted DA-might be called ''highlevel'' measures: they report how well Cremona map tracking performs the tasks we want it to do. By contrast, one might describe the next two measures as ''low level'': they report how well Cremona map tracking performs on a turn-by-turn basis.
Recall, for a moment, the origin of our Cremona maps: from a tracking code, corresponding to some exact map X, we extract a truncated Taylor series map T m containing terms through order m, so that T m X Oz m1 . This Taylor map is then symplectified by converting it to a Cremona map C m that agrees with the Taylor map through the same order m; hence C m T m Oz m1 . It follows that the three maps X, T m , and C m all differ from one another by terms that scale as z m1 ; but we would like to know just how big those differences are. To that end we define the one-turn errors where k k denotes an appropriate vector norm on phase space. To determine in some meaningful way how the average errors vary with amplitude, we require at each amplitude a set of points over which to average. To generate a given set of points, we launch a single particle at the given amplitude (between 3 and 12 , where is defined in terms of the local lattice function values); and we use direct tracking to follow it for 1000 turns, thus generating the phase-space points fz 0 ; z 1 ; z 2 ; . . . ; z 1000 g fz 0 ; Xz 0 ; Xz 1 ; . . . ; Xz 999 g: The first 1000 of these, 0 -999, constitute the desired set of points. Applying also the maps T m and C m to these points, we can compute the one-turn errors defined in  Our last quantitative measure for comparing direct and Cremona map tracking is the horizontal phase error h . It measures the phase difference in (normalized) horizontal phase space between a particle tracked element by element and the same particle tracked using a Cremona map. Figure 8 shows the evolution of h for map orders 6, 8, and 10 at two different amplitudes, 8 and 12.

B. Analysis
The detuning error, shown in Fig. 4, behaves more or less as expected: the error grows with amplitude, and higher-order maps have significantly smaller detuning errors. Note, however, that the improvement seen in the detuning error as one goes to higher order becomes much smaller at the larger amplitudes. In particular, going from order 6 to order 8 shrinks the average error by more than a decade at amplitude 1 but by less than a factor of 2 at amplitude 10.
The predicted DA, however, does not show the same consistent improvement as one goes to higher order: As seen in Fig. 5, Cremona map tracking at order 6 agrees very satisfactorily with direct tracking. But at higher orders Cremona map tracking tends to underestimate the DA-and by as much as several sigma for the larger angles. This fact, puzzling at first sight, becomes more understandable in light of the one-turn errors shown in Figs. 6 and 7.  Figure 6 shows that the average one-turn error has the correct scaling with amplitude: Consider, for example, " TX m of Eq. (18a). The mth order Taylor map T m makes errors of order z m1 , hence " TX m z Oz m1 , and similarly for the other one-turn errors, " CX m and " CT m . And in the log-log plots of Fig. 6 the errors labeled order m do indeed lie on lines of slope roughly m 1.
The three plots in Fig. 6 show that the differences between the three maps -Cremona, Taylor, and exactare all comparable, but closer inspection reveals more. A comparison of plots 6(a) and 6(b) shows that the error made by the Cremona map, " CX m , is roughly 2 to 3 times that made by the Taylor map, " TX m . And a comparison of plots 6(b) and 6(c) shows that " CX m roughly equals the difference between the Cremona and the Taylor maps, " CT m . In other words, the difference between the Cremona and the exact maps is due mostly to the process of symplectification. Observe also that the power-law scaling of the various orders implies that at some amplitude higher-order map tracking must lead to larger errors than lower-order map tracking.
The plots in Fig. 7 correspond to those in Fig. 6, except that here all the particles have a relatively large momentum deviation of 7:5 10 ÿ4 , which masks the Oz m1 scaling at amplitudes below about 8 or 9. One may here make comparisons similar to those made above for Fig. 6. But it suffices to note [see plot 6(b)] that at this nonzero value of , relevant for the tracking studies, the amplitude at which higher-order tracking begins to give poorer results is less than 12.
The results shown in Fig. 8 reinforce the observations made above for Fig. 7: At some amplitude between 8 and 12 tracking at orders higher than 6 no longer produces smaller errors.
We conclude that the difficulty higher-order Cremona map tracking has (see Fig. 5) in predicting the dynamic aperture results from the small-amplitude loss of precision introduced by the symplectification process coupled with the steep power-law scaling of the one-turn errors. Ironically, one can expect better results when applying Cremona map tracking to a machine with larger errors and, hence, a smaller dynamic aperture. Indeed, in preliminary studies for an earlier version of the LHC lattice, we successfully used Cremona map orders higher than 6.

VI. SUMMARY
By replacing direct tracking of the LHC with Cremona map tracking at order 6 (which includes magnetic multipole components up through order 7), one can obtain a net gain in speed greater than ten; and one can predict DAs that agree very well with those of direct tracking. Moreover, for the amplitudes of interest, Cremona map tracking produces errors that are smaller at order 6 than at higher orders. We conclude that for the current version of the LHC, where multipole components beyond order 7 are less relevant, Cremona map tracking at order 6 is an attractive alternative for doing rapid systematic investigations.
To the potential client at other future accelerators, we recommend examining the one-turn errors " m , which can be done rapidly. From plots such as those in Fig. 7 one can determine whether or not Cremona map tracking will be useful.

ACKNOWLEDGMENTS
One of us (D. T. A.) would like to thank the AP group of the CERN SL division for their generous hospitality and support.