Analysis of Nonlinear Dynamics by Square Matrix Method

The nonlinear dynamics of a system with periodic structure can be analyzed using a square matrix. We show that because the special property of the square matrix constructed for nonlinear dynamics, we can reduce the dimension of the matrix from the original large number for high order calculation to low dimension in the first step of the analysis. Then a stable Jordan decomposition is obtained with much lower dimension. The Jordan decomposition leads to a transformation to a new variable, which is an accurate action-angle variable, in good agreement with trajectories and tune obtained from tracking. And more importantly, the deviation from constancy of the new action-angle variable provides a measure of the stability of the phase space trajectories and tune fluctuation. Thus the square matrix theory shows a good potential in theoretical understanding of a complicated dynamical system to guide the optimization of dynamical apertures. The method is illustrated by many examples of comparison between theory and numerical simulation. In particular, we show that the square matrix method can be used for fast optimization to reduce the nonlinearity of a system.


Introduction
The question of the long term behavior of charged particles in storage rings has a long history. To gain understanding, one would like to analyze particle behavior under many iterations of the one turn map. The most reliable numerical approach is the use of a tracking code with appropriate integration methods. For analysis, however, one would like a more compact representation of the one turn map out of which to extract relevant information. Among the many approaches to this issue we may mention canonical perturbation theory, Lie operators, power series, and normal form [1][2][3][4][5][6][7][8][9][10][11], etc.
Here, we would like to look at this problem from a somewhat different perspective: we shall analyze the problem by the method of square matrix [12,13], constructed out of the power series map [6]. In this paper, we first outline the concept of the square matrix method, then we discuss its application, and then explain the mathematical details of the method, about how and why it is suitable for high order calculation. In the following introduction, we first outline the basic concept of the square matrix analysis in subsection 1A-1D, then in subsection 1E we present the outline of the paper.

1A. Representation of nonlinear maps using square matrices
We consider the equations of motion of a nonlinear dynamic system with periodic structure such as Hills equation, it can be expressed by a square matrix.
If we use the complex Courant-Snyder variable z = x − ip, its conjugate and powers z, z * , z 2 , ... as a column Z, the one turn map can be represented by a large square matrix M using Z = M Z 0 .
For example, for a simple Henon map [8], x = x 0 cos µ + p 0 sin µ + x 2 0 sin µ p = −x 0 sin µ + p 0 cos µ + x 2 0 cos µ , (1.1) we use the variables z = x − ip and z * = x + ip to write z, z * and their higher power monomials after one turn of rotation, or after one element in an accelerator lattice, as a truncated power series expansion of the initial z 0 = x 0 − ip 0 and z * 0 = x 0 + ip 0 . For example, up to 3rd order, we have: In general, there are constant terms in the expansion, even though in this example, the offset of x is zero, so the constant terms are also zeros. We may now write this in the matrix form: where to 3rd order, we define the 10×1 monomial arraỹ Z = (1, z, z * , z 2 , zz * , z * 2 , z 3 , z 2 z * , zz * 2 , z * 3 ). (1.4) The rowZ represents the matrix transposition of the column Z. The vector Z spans a 10 dimensional linear space. The matrix M , when operated on Z 0 , represents a rotation to Z in this space. We remark here that even though we mostly use M to represent one turn map for a storage ring, it can be used to represent one element in the storage ring dynamics or other nonlinear dynamics problem.

1B. Eigenvectors of Jordan blocks
We would like to extract the spectral structure from the matrix M. The first inclination may be to try to diagonalize and find the eigenvalues. It turns out, however, that for non-linear maps M is not diagonalizable. All square matrices may be transformed into Jordan form, however. Doing so, we find a transformation matrix U and a Jordan matrix τ so that every row of the matrix U is a (generalized) left eigenvector of M satisfying U M = e iµI+τ U (1.7) In the above example, for the case of two variables z, z * at 3rd roder, the matrix M is a 10×10 matrix, I is a 2×2 identity matrix, the matrix U is a 2 × 10 transformtion matrix, while the Jordan matrix (1.9) In the example for the case of 4 variables x, p x , y, p y at 7'th order, as we shall explain later, the matrix M is a 330 × 330 matrix, I is a 4 × 4 identity matrix, the matrix U is a 4 × 330 transformation matrix, while the Jordan matrix τ also has dimension 4.
As Z = M Z 0 (see Eq. (1.3)), Eq. (1. 7) gives (1.10) Now we define a transformation W represents the projection of the vector Z onto the invariant subspace spanned by the left eigenvectors u j given by the rows of the matrix U , such that each row of W is w j = u j Z, a polynomial of z, z * . Then Eq. (1.10) implies the operation of one turn map Z = M Z 0 , corresponds to a rotation in the invariant subspace represented by W = e iµI+τ W 0 . (1.12) 1C. Multi-turns behavior, coherent state, and frequency fluctuation The new vector after n turns becomes W = e n(iµI+τ ) W 0 = e inµ e nτ W 0 .
(1. 13) KAM theory states that the invariant tori are stable under small perturbation (See, for example, [1,8,19]). In our examples, for sufficiently small amplitude of oscillation in z, the invariant tori are deformed and survive, i.e., the motion is quasiperiodic. So the system has a nearly stable frequency, and when the amplitude is small, the fluctuation of the frequency is also small. Thus for a specific initial condition described by Z 0 , the rotation in the eigenspace should be represented by a phase factor e i(µ+φ) so that after n turns W = e n(iµI+τ ) W 0 ∼ = e in(µ+φ) W 0 . (1.14) We remark that if W 0 is an eigenvector of τ with eigenvalue of iφ, i.e., (1.15) then Eq.(1.14) is satisfied. We may rephrase this as: if W 0 is a coherent state [20,21] of τ with eigenvalue iφ, then φ is the amplitude dependent tune shift.
In Eq. (1.14) and Eq. (1.15) we use the approximate equal sign because for a matrix τ of finite dimension m, there is only approximate coherent state, the coherent state exists only when the dimension m approaches infinity. To see this we write the Eq. (1.14) explicitly using the property of the Jordan matrix τ given by Eq. (1.9) as a lowering operator: (1. 16) where w j 's are the rows of W 0 . Compare the two sides we find It is obvious that the equal sign holds only as the dimension m approaches infinity. In the study of truncated power series, m is finite, hence Eq. (1.17) is an approximation. As we shall show later, the polynomials w m−2 , w m−1 have only high order terms, and as m increases, when the amplitude of z is sufficiently small, the last term in Eq. (1.17) becomes the ratio of two negligibly small numbers, and is less accurate. In addition to the condition Eq.1.17 for a stable motion, obviously, another condition is The Eq. (1.17) also shows that w 2 1 ∼ = w 0 w 2 , and hence we have Using Eq.(1.13), we can derive the multi-turn behavior of w 0 as an expansion of number of turns (See Section 7) w 0 (n) = e inµ+inφ+ n 2 2 ∆+... w 0 (n = 0). (1.20) The contribution to phase advance from ∆ is proportional to n 2 instead of n. Hence the deviation of ∆ from zero gives the information about the fluctuation of the frequency (tune variation) during the motion and the loss of stability of the trajectory. This seems to be related to the Liapunov exponent [1]( p.298), and hence we use it to find the dynamic aperture in the examples in the later sections. The Eq. (1.14), derived for x, y planes separately, leads to a set of action-angle variables w 0x ; w 0y , with its action amplitude and phase advance angle nearly constant of motion up to near the border of the dynamic aperture or resonance lines. In addition, we find that near this border the deviation of these actions from constancy provides a measure of the destruction of invariant tori, or a measure of the stability of trajectories and tunes.
We consider the functions w 0x (z x , z y ), w 0y (z x , z y ) as a definition for a variable transformation. These functions and their inverse functions z x (w 0x , w 0y ), z y (w 0x , w 0y ) provide a one to one correspondence between the z x , z y planes and the w 0x , w 0y planes. During the motion |w 0x |, |w 0y | are only approximately constant, we do not use w 0x , w 0y for long term tracking of particles. However,the deviation of their amplitude from constant, described by the function ∆ of Eq.1.19 in the w 0x , w 0y planes, provides a provides a measure of nonlinearity.
To clarify the relation between this variable transformation and the one turn map, we discuss the relation between four maps. The one turn map in z x , z y (the first map) is given by the first two rows of Eq.(1.2). The one turn map of w 0x , w 0y (the second map) is exactly equivalent to the one turn map in z x , z y planes, as long as the inverse function z x (w 0x , w 0y ), z y (w 0x , w 0y ) exists, because it is the same map expressed in terms of another set of variables. For sufficient high order, the non-existence of the inverse function in a region indicates the motion is unstable in that region, i.e, it is at the border of dynamic aperture. We do not have explicit expression for this second map, we only know it as an implicit function through the first map. The third map Eq.(1.12) is equivalent to Z ∼ = M Z 0 , only provides an approximation to the first two exact maps. It is a truncated map, not symplectic. The fourth map is given by W ∼ = e in(µ+φ) W 0 . When we take φ to be a real constant determined from Eq.(1.17) by W 0 , this map is a twist map [1,8].
The advantage of using the variables w 0x , w 0y is that the exact map in w 0x , w 0y only has a small deviation from the twist map (the four'th map). Hence the deviation is considered as a perturbation to the twist map. The deviation is embodied by the fact that, for the second map, |w 0 | and φ are not constant after one turn, and ∆ = 0, Im(φ) = 0. Because the deviation is small even close to the dynamic aperture, the perturbed twist map can be used for analysis of long term behaviour, for example, as used in Poincare-Birkhoff theorem [8]. Similarly, the third map is also a perturbed twist map when we use w 0x , w 0y variables. The difference from the second map is that, it is not symplectic, but it approaches the second and the first original map when we increase the order of the square matrix, and the perturbation becomes smaller and smaller.
The central topic of this work is to study the differences between the second map (hence the first map, obtained from tracking simulation) and the four'th map, an idealized integral system, using the characteristic functions ∆, Im(φ), obtained from the square matrix analysis.

1D. Comparison with Normal Form
There is apparently a similarity between the square matrix method and the normal form. However the similarity is only superficial. The differences between the two methods are very obvious, we name a few here: 1. We emphasize that we only need one straight forward step of Jordan decomposition to derive the high order result while the normal form requires order by order iteration, thus making the procedure very complicated.
2. The tune expression is a rational function of the variables for the square matrix method while the normal form for the tune is a polynomial, so it is obviously very different.
3. We obtain the expression for the tune fluctuation and amplitude fluctuation, which is essential in its application to predict the dynamical aperture in a practical way while the normal form does not have such kind of expression because the frequency variation is intrinsically associated with the non-integrability of the system.
The essential feature of the square matrix method is that it is easy to reach very high order with high precision. Hence a detailed comparison requires works on both square matrix and normal form. Obviously this requires a significant work. Clearly the best way is to compare with the exact answer from particle tracking. The purpose of this paper is to provide examples of this comparison as outlined in the following.

1E. Outline
First, in Section 2, as an illustration for the basic principle, we give one simple example of the solution of a nonlinear differential equation. We apply the square matrix method given above to solve this equation to lowest nonlinear order of 3, and compare the method with the well known result given in the text book by Landau and Lifshitz [22].
In Section 3 we apply the square matrix method to the one turn map of storage ring lattice to compare the square matrix analysis with simulation. In Section 4 we present one example of the application, the manipulation of phase space trajectory.
The applications of the square matrix analysis in Section 2 to 4 are all based on the variable transformation from z to the action-angle variable w 0 in Eq.(1.11), obtained from the left eigenvectors U in Eq.(1.7). This matrix U is obtained by Jordan decomposition of the mapping matrix M in Eq.(1.3). For high order, the dimension of M is large. Hence the applications described in Section 3 and 4 depend on the efficient Jordan decomposition of very large dimension matrix M with high stability and precision.
In Section 5,6, we explain how to achieve the efficient Jordan decomposition. Since the details of this are often technical and involve some abstract mathematics, we only present the outline and leave the details in the appendixes.
In Section 7 we analyze the multi-turn behavior of the action-angle variable w 0 , and show that the phase advance has a term ∆n 2 /2, as given in Eq. (1.20). This section is an important section because it gives a more clear physical meaning to the quantities such as w 0 , φ, and ∆, used in the applications in Section 3 and 4. The fact that we present these after Section 5 and 6, which involve more mathematics, is because it requires some of the basic concepts such as the chain structure of the Jordan decomposition given in Section 5,6.
In Section 8 we show how to improve the stability and precision while ensuring the uniqueness of the Jordan decomposition, which is crucial for the square matrix method. Finally, Section 9 gives a summary.
2. One example: The solution of a nonlinear differential equation As an illustration of the square matrix method, we consider the differential equation given in the text book by Landau and Lifshitz [22]: In order to apply the square matrix method, we first transform to normalized coordinates z ≡x − ip, z * ≡ x + ip, wherex ≡ ω 0 x,p ≡ẋ. The differential equation becomeṡ Following the steps in Section 1A, we can write a square matrix equationŻ = M Z. We remark that this equation is different from the Eq(1.3), where the left hand side of the equation is the new column Z after one turn, while here we have the time derivative of the columnŻ because in this example, we discuss a differential equation rather than a one turn map. But the basic principle for solution is the same. For example, the third row inŻ = M Z is Expand the polynomial and continue to derive the derivative of the monomials in Z, truncated at 3rd order, the last row is Combining these results, we findŻ = M Z, with The 9×9 matrix M has 9 left eigenvectors corresponding to each of the 9 diagonal elements of M . Two of them have eigenvalue iω 0 . We arrange these two eigenvectors as a column of two rows U = u 0 u 1 = The solution of this matrix equation is where W 0 is the initial value of W . Since this solution is based on truncated power series of z, z * , it is an approximation, valid for a finite time interval. Within this approximation, the matrix τ is to be replaced by a number iφ so that φ represents a frequency shift. That is, W 0 should be an eigenvector of the matrix τ with eigenvalue iφ, and every row of W 0 should be an action-angle variable, up to the order of the expansion. The two rows of the matrix equation Eq.(2.7) arė From the top row, we identify φ = −i w1 w0 as the frequency shift. Using Eq.(2.6), we have To the lowest order approximation, in deriving the frequency shift, we only keep the first term z in w 0 of Eq.(2.10), substitute the expression of a, b in eq.(2.2) and the relation between z and x, takeẋ = 0, get the amplitude dependent frequency shift by Landau [22]: As action-angle variable, |w 0 |, |w 1 |, |φ| = |w 1 /w 0 | are invariant. In Fig.1 we plot the contours of |w 0 | in the x,ẋ plane for a case where ω 0 = 1, α = 0.2, β = 0.1, in comparison with the solutions of Eq.(2.1). It is clear that even when initial amplitude is as large as x=2, the trajectory of the solution, even though significantly deviates from a circle, is still in good agreement with the constant contour of |w 0 |, which corresponds to a circle in w 0 complex plane. Hence the terms in w 0 with order higher than 1, provide correct information about the nonlinear distortion of the trajectory. When we examine Eq.(2.10), we observe that w 1 has only a 3rd order term, hence this term is the approximation of the lowest order for w 1 . Actually, up to 3rd order the term |z(zz * )| = |z| 3 represents a circle in the complex plane of z, the higher order terms needed to represent the To calculate the frequency shift more accurately, we take the definition of w 0 as a variable transformation from z, z * to w 0 , w * 0 , then the top row of Eq.(2.9) can be taken as a differential equation for w 0 . To clarify this point, we remark that φ(z) = −iw 1 (z)/w 0 (z) is a function of z in Eq.(2.9), and we take z(w 0 ) as the inverse function of w 0 (z) given by Eq.(2.10), then we consider φ in Eq.(2.9) as an implicit function of w 0 . Hence even φ(w 0 ) = φ(z(w 0 )) is an invariant approximately, it evolves with time and has small deviation from constant.
We take w 0 = re iθ . Based on the discussion following Eq.(2.11), r is nearly a constant, up to 3rd order. Hence φ is a function of θ. When substituted into Eq.(2.9), we get We can solve Eq.(2.12) along a circle as a function of θ in the w 0 plane and find the period T : From the period T we calculate the frequency shift as ∆ω = 2π/T − ω 0 = 1/ < dt dθ > −ω 0 , where < dt dθ > is an average over a period. In Fig.2 we plot the frequency shift obtained from this result, and compare with the Landau formula Eq.(2.10), and the frequency calculated from the direct numerical solution of Eq.(2.1), for the case of ω 0 = 1, α = 0.2, β = 0.1, showing that indeed this higher order calculation provides a much better agreement at larger amplitude of x=2 than the Landau formula. It is clear that at 3rd order the agreement with exact solution is excellent. We remark that in the square matrix method, the calculation is carried out in one step to 3rd order without iteration steps. For first and second order, there is only one diagonal element in M equal to iω 0 and the eigenspace for iω 0 has only dimension 1, hence we directly start the calculation at 3rd order. For the traditional canonical perturbation method, the calculation is carried out order by order from low to high by complicated iteration procedure. Hence it seems these two methods are not completely equivalent. The advantage of the square matrix approach is that it only need one step to reach high order, and its procedure is simple and straight forward. In the next section we apply this method to the analysis of the one turn map of the storage ring lattices, and compare with simulation.

Comparison with Simulation
3A. Summary about application of the theory When we study the nonlinear dynamic equations such as Hill equations, in order to have the square matrix to be triangular, we always first convert the variables x, p into normalized Courant-Snyder variablesx,p using the betatron amplitude matrix B −1 (see, e.g., S.Y. Lee, p.49 [23]). Then they are converted to scaled variables using a scaling parameter s, as will be given in Section 8: For 4 variables such as x, p x , y, p y , it is similar, with both B −1 and K −1 replaced by 4×4 matrixes. But we must first carry out linear decoupling between x and y. As we shall point out in Section 8, we select the scaling factor s for the variable to minimize the range of the coefficients in the square matrix, in order to ensure the stability and precision of the Jordan decomposition. During our calculation we use the Taylor expansion provided by the well-known program of TPSA (truncated power series algorithm) [6] as our starting point of the one turn map corresponding to Eq. (1.1). We construct the square matrix M up to a certain order. Then we apply Jordan decomposition for the invariant subspaces to obtain the transformation matrix U for the specific eigenvalues e iµx , e iµy . As described in Section 1B, we use the first row of U to define the variable transformation w 0 = u 0 Z. This serves as an accurate action-angle variable (for 4 variables case we have w 0x = u 0x Z, w 0y = u 0y Z). Then the 2nd and 3rd row of U are used to calculate the functions φ and ∆ using Eq. (1.17) and Eq. (1.19). We remark here that these are no longer polynomials, they are rational functions. The equations define the set of variables w x , w y , w * x , w * y as functions of z x , z y , z * x , z * y . Notice that because w 0 , representing the  Fig.4 with constant |w x | at 5,7,9th order (blue, green, red) inz x phase space compared with result of tracking 512 turns(magenta) first row of W , is used very often, we simply use w to represent it, and from now on often we use w x0 to represent its initial value when we specify it. When the coherence conditions Eq. (1.17), Eq. (1.18) and Eq. (1.19) are satisfied, these two equations provide a transformation from z x , z y to the new variables as a set of accurate action-angle variables.
The inverse function z x (w x , w y ), z y (w x , w y ) of Eq. is very useful because a set of constant r x ≡ |w x |,r y ≡ |w y | describes the motion of the particles. Appendix D shows that the inverse function can be calculated using the inverse of a upper-triangular matrix in a simple way. In Fig.3, for a lattice "EPU"of NSLSII ("National Synchrotron Light Source II") with all insertion devices included, for the points around a circle in w-plane (i.e., a blue circle with r=constant), we use the inverse function of w x (z x0 ) to find a set of initial x 0 , p x0 (y 0 , p y0 are set zero), then after tracking these particles for one turn, calculate w x (z x ) and plot the red curves. We can see that when x approaches the dynamic aperture at x=25mm, w x (red) gradually deviates from the circles (blue).
In Fig.4, for another one of the lattices of NSLSII, we plot onz x -plane a circle which passes through the point corresponding to initial x 0 = 25mm, p x0 = 0, and plot the tracking result over 512 turns as the magenta curve. Let w x = re iθ and initial r 0 ≡ |w x0 | we also plot all the points calculated according to a constant |w x | = r 0 . The calculation is up to 5 (blue), 7(green), and 9(red) order, respectively.
It is clear the agreement is excellent. To see the errors of different order, we plot the same set of data in Fig.5 with fine details, showing |z x | as function of angle ψ x (the phase angle ofz x = J x e iψx ). We can see that as the order for the constant |w x | increases, the agreement with the tracking result (magenta curve) converges slowly, with the 9th order (red) more close to the tracking.
In Fig 6 we show the the phase advance ∆θ after one turn as functions of initial phase θ 0 for various initial r 0 , which corresponds to different initial x 0 . The dotted curves are from tracking, solid curves are calculated from Reφ + Im∆/2. It is clear from Fig.6 that as r increases, ∆θ have larger and larger variation. For large amplitude x=22mm, we can see that the two curves do not agree with each other, even though the trend of increased variation is obvious. The theoretical prediction on ∆θ is given by the variation of Reφ + Im∆/2 as function of θ. These variations are an indication of a deviation from coherence, i.e., a violation of the condition given by Eq. (1.17). Thus when this condition is violated, the calculation φ lost precision, hence they deviate from the tracking results. However, even though the fluctuation of φ does not accurately predict the deviation, it still provides information about the deviation from coherence.
Similarly, the fluctuation of r after one turn starting from a circle of constant r 0 also provides information about the deviation from coherence, approximately agrees with the prediction of Imφ. In Fig. 7, we plot the peak to peak deviation of |∆w/w| as function of initial x. We see that Im(∆φ) does not give accurate |∆w/w|, so that we need to multiply Im(∆φ) by a factor 6 to obtain agreement with |∆w/w| found from tracking. This is because Im(∆φ) = 0 itself implies the theory lost its precision. But it is seen from this plot that the deviation from coherence is predicted by Im(∆φ) correctly, and it does serve as an index for the proximity to the destruction of invariant tori.
Next, we check the cases of 4 variables x, p x , y, p y . We study a lattice named nsls2sr supercell ch77. In Fig.8, the top row is the Poincare surface of section [1,2] expressed by the Courant-Snyder variablez x =x − ip x and z y =ȳ − ip y . The horizontal axes are their phase angles ψ x , ψ y respectively. The vertical axes are their amplitude |z x |, |z y |. The left plot is for the amplitude |z x |, the right one is for |z y |. For the case of initial x=10mm, y=2mm, we track the particle for 512 turns. Every point on these plots is obtained from the coordinates for a specific turn. For the same set of data, when we convertz x ,z y to w x , w y and plot the Poincare sections for the corresponding variable θ x , θ y (the phase angle of w x = r x e iθx , w y = r y e iθy ) as the transverse axes, and r x = |w x |, r y = |w y | as the vertical axes, we obtain the bottom row of Fig.8. Clearly, these new variables now move on two separate flat planes in the two Poincare sections, representing two independent rotations. Thus the transformation to new variables w x , w y reduces the very complicated motion expressed by z x , z y to two very simple uniform independent rotations.

3C. Amplitude Dependent Tune shift and Tune footprint
In Fig.9 we plot tune ν x as function of initial x for various y position (initial p x = p y = 0), compare tune from tracking result (green) with tune calculated from µ + Reφ (red) using Eq. (1.17). There is an excellent agreement up to near the dynamic aperture. We see that at y=6mm and y=-6mm, when the x passes x=-1mm, there is a resonance. We can see the green curve (tracking) has a discontinuity, and the red curve (the square matrix derived tune) also has a jump. Near this point ν x ≈ 2ν y , we have two frequencies dominate the spectrum of x motion: ν x , 2ν y . The single frequency condition is no longer valid. Hence the coherence condition Eq. (1.17) is violated. Even though the red curves seem to exaggerate the discontinuity, it does show the resonance clearly. This suggests that the square matrix analysis may provide more detailed understanding about resonances. However, we will not discuss about resonances in general in this paper.
In Fig.10, we plot the tune footprint calculated from tracking (green) and from µ+Reφ . Clearly this shows we can calculate tune footprint approximately from square matrix without the very time consuming tracking particles for various initial x and y.

3D. Coherence Region and Dynamic Aperture
We are interested in the range of the region where our coherence conditions Eq. (1.17) to Eq. (1.19) are valid. We can find this range by tracking particles with different initial conditions and find the tune variation such as in the calculation for a frequency map [14]. However, it is possible to find this range without tracking particles for many turns. For this we need to calculate φ and ∆ for a set of points where |w x |,|w y | are constants.
For a given set of points on the θ x , θ y planes, as shown in the two Poincare sections of constant |w x |,|w y | of Fig.8, we find their coordinates z x , z y using the inverse function of Eq. (3.3). From this set of z x , z y we use Eq. (1.11) to calculate w 0 , w 1 , w 2 . Then we use Eq. (1.18) and Eq. (1.19) to calculate φ and ∆. These results are used to calculate the standard deviation of Reφ, Imφ and ∆ .
In Fig.11 we use color scale to represent the RMS value of ∆ x in xy plane. For every point on this plane, we find the corresponding |w x |,|w y | assuming initial p x = p y = 0. Then, for a set of azimuthal angles θ x ,θ y of the corresponding w x ,w y we find the inverse function solution for Eq. (3.3) and use the resultz x ,z y to calculate the standard deviation for ∆ x for both x and y motion respec- tively. At x=-1mm, y=5.5-9mm, we see the resonance behavior discussed regarding to Fig.9. For x between 20mm and 25mm and for y from 0 to 8mm, we can see the color changes from dark blue to light blue, passing through yellow to red, reaching dark brown. This is the region where we see |w x | gradually deviates from being a constant during the motion. In Fig.12, we plot the frequency diagram obtained from tracking using elegant [24] for the same lattice setting. When compared with Fig.11, we see that it gives a crude picture about the dynamic aperture. Even though, without multi-turn tracking, the plot does not give the detailed structured frequency map, the light blue area gives information about the area of larger tune variation. Fig.11 confirms the expectation that the function ∆ is related to the coherence condition or stability condition. The analysis by the square matrix method given in the previous sections can be used for nonlinear optimization of storage ring lattices. In the following we give an example of phase space trajectory optimization by this method. A separate paper to discuss using the square matrix method to optimize several nonlinear lattices is in preparation [25]. As described in Fig.4 and Fig.8, the FIG. 13: Trajectories in y-y phase space for 5 particles before (top) and after optimization (bottom) by square matrix action defined from Courant-Snyder variable J x ≡ |z x | and J y ≡ |z y |, as calculated from Eq. (3.1), is no longer constant when nonlinearity dominates over linear dynamics. There is a significant distortion from flat planes in the Poincare section. We characterize this distortion by ∆J/J = (J max − J min )/J mean . When the distortion is large, the particles receive much larger nonlinear kicks from the higher order multipoles when x reaches maximum, and hence the system becomes more nonlinear. The goal of nonlinear optimization is to reduce the nonlinear distortion, and hence increase the dynamic aperture. In the 1-D case, this is equivalent to optimize the trajectories in the space of Courant-Snyder variablesx,p so that they are as close as possible to circles with constant radius. From our previous analysis in Section 3B, we see that the invariant tori are given by constant |w x |, |w y | . Hence, to minimize ∆J/J , we need to calculate ∆J/J for contours with constant |w x |, |w y | and vary the sextupoles to minimize ∆J/J on these contours. In other word, for given pair of r x = |w x |, r y = |w y |, and for a set of θ x ,θ y we need to calculate the corresponding set of z x ,z y , calculate ∆J/J , then, based on these steps to minimize ∆J/J. It is clear from this we need to use the inverse function solution of Eq. (3.3). However, even though the inverse function calculation is made easy and fast by the use of the inverse matrix mentioned in Appendix D, we would like to carry out this optimization without the inverse function calculation at all. Therefore we remark here that the task of minimizing ∆J/J is equivalent to optimize the system so that flat planes in the Poincare sections in w x ,w y space (as shown in Fig.8) are mapped to approximate flat planes in the Poincare sections in thez x ,z y space, and vice versa. Because w x , w y have been derived as polynomials ofz x ,z y already, the optimization can be carried out by minimization of |∆w/w| in the Poincare sections in w x w y space instead. This is as shown in Fig.8 but with the rolls of w x ,w y exchanged withz x ,z y . Thus for a pair of constants J x ≡ |z x | and J y ≡ |z y |, and a set of ψ x ψ y , we calculate w x w y , then minimize |∆w/w|.
We applied this optimization for the lattice "nsls2srsepercell ch77" which we have discussed in regard of Fig.8. In Figure 13 we compare the trajectories of several particles in phase space y − y before (top) and after (bottom) the optimization [25]. Different color represents different initial x, x , y, and y . In these 5 pairs of x and y, the initial y is chosen to be proportional to the initial x. The maximum initial x is 20mm, so the x-motion is nonlinearly coupled into y-motion, generating complicated motion in y − y plane. It is obvious that even though the lattice of the top plot has been optimized for NSLSII operation with nonlinear driving terms minimized already, the further optimization by square matrix method clearly further reduces the nonlinearity of the system significantly. For this specific example, 3 sets of Poincare sections are selected to minimize |∆w/w|. The J x , J y for these Poincare sections are derived from the following 3 pairs of initial conditions x 0 , y 0 = {2.5e-2, 5e-3};{1e-2,2e-3};{3.5e-2,3e-3}, respectively. This choice is not unique. Actually, the question about how many Poincare sections should be used, and how many points in each section are taken, is open for future exploration of very fast optimization method.

Square matrix, its structure and its invariant subspace
All the applications described in Section 3 and 4 are based on the use of the left eigenvectors equation Eq.(1.7), so that we can apply U for the variable transformation from z to the action-angel variable w 0 by W = U Z in Eq.(1.11). In Section 5 and 6, we shall outline the construction of U by Jordan decomposition of the square matrix M . For high order, M has very large dimension. Hence we need very efficient way to calculate U from M .
The square matrix M has a special property that it is upper-triangular and all its eigenvalues are its diagonal elements, and hence are precisely known. In this section, we first show that because this special property, in a first crucial step, we can reduce the analysis of the very high dimension matrix M into the analysis of its eigenspace of eigenvalue e iµ with much lower dimension. In Section 6, we show that the final Jordan decomposition is carried out in this eigenspace, and the dimension of the final matrix U is further lower than this eigenspace. Hence the analysis is greatly simplified.
The first step of the analysis of the square matrix is the counting of the number of terms. For two variables such as x and p, for terms of order k, all the terms has a form of z m z * (k−m) , with 0 ≤ m ≤ k, so the number of terms of order k is k+1. So for order n, we need to count the number of of terms from 1 to n+1 (we count k from 0 to n). The sum is (n+1)(n+2)/2 [7]. Thus for 3rd order it is 10, as shown in the example.
For 4 variables such as x, p x , y, p y , the numbers of terms in order of n is (n+1)(n+2)(n+3)/6. Hence for order 1, 2, 3, ..., n the number of terms are 4,10,20, ...,(n+1)(n+2)(n+3)/6, respectively. For the square matrix dimension, after summing up these numbers of terms, we find it is (n+1)(n+2)(n+3)(n+4)/24. To save space, we shall not give the derivation of the summation here. But it is very easy to test this result. As an example, when truncated to 7th order, the matrix dimension for 4 variables is 330. This rapid increase of matrix dimension as the order increases seems to indicate a very fast increased complexity of the problem.
However, as we will show in the following, the point of interest is not the dimension of the full matrix but one of its invariant subspaces with much lower dimension. Here "invariance" means that a vector in a subspace of the full space spanned by Z, after multiplied by the matrix M , remains in the same subspace. In the example represented by Eq. (1.5), for 2 variables and at 3rd order, even though the dimension of M is 10, there are only two independent (generalized) eigenvectors with eigenvalue of e iµ , among the 10 eigenvalues in the list {1, e iµ , e −iµ , e 2iµ , 1, e −2iµ , e 3iµ , e iµ , e −iµ , e −3iµ }. These two eigenvectors span an invariant subspace of dimension 2 (see Appendix A). Thus the rotation generated by the matrix is represented by a 2×2 matrix in this subspace, representing the nonlinear dynamics.
To count the number of diagonal elements for M with value as e iµ , when we examine how these elements are generated from Eq.(1.2) to Eq.(1.3), we note that all these elements must come from the monomials in Z in the form of z(zz * ) m so that its coefficient is e iµ (e iµ e −iµ ) m = e iµ . For a given maximum number m ≥ 0, the order n must satisfy n ≥ 2m + 1, hence either n = 2m + 1, or n = 2m + 2. For example, if m = 0, the order n is 1 or 2. If m=1, then n is 3 or 4. Hence for order n, the number of diagonal elements with eigenvalue e iµ is m+1, i.e, the integer part of (n+1)/2. For order 1,2,3,4,5,6,7 and for the case of two variables, this gives the dimension of the invariant subspace is 1,1,2,2,3,3,4, respectively.
Same way, in the case of 4 variables x, p x , y, p y , all the elements with value e iµx must come from the monomials in Z in the form of z x (z x z * x ) mx (z y z * y ) my . If within order n, the maximum number of m x + m y = m, then we must have n ≥ 2m+1, hence either n = 2m+1, or n = 2m+2. If m y = 0 then m x can take value of 0, 1, 2, . . . , m, i.e., there are m + 1 terms. If m y = 1, then m x can only takes value from 0 up to m − 1, i.e., there are m terms. We continue this counting until for m y = m, then m x = 0 so there is only one term. Thus we count all the terms including all possible m y , by summing up from 1 up to m+1. The sum is (m + 1)(m + 2)/2. As an example, let m=3, then n is either 2m + 1 = 7, or 2m + 2 = 8. Hence for order 7 or 8, the total number of diagonal terms with value e iµx is (m + 1)(m + 2)/2 = 10. Thus the eigenspace has dimension 10, as compared with the full dimension 330 of the square matrix M .
Notice that among these 10 elements, 4 of them have m y = 0, 3 of them have m y = 1, 2 of them have m y = 2, 1 of them has m y = 3. Later in the next section, we find that the Jordan decomposition of this eigenspace separates it into 4 invariant subspace with dimension 4,3,2,1 respectively, adding up to 10. The fact that they have the same structure seems not to be a coincidence, even though we do not have a general proof so far.
More importantly, a great simplification comes from the fact that the matrix is upper-triangular with all its eigenvalues given by its diagonal elements precisely determined by the tune µ as long as we use the variables z, z * instead of x, p. As is well known and explained in the appendix A, for triangular matrix, the generalized eigenvectors can be calculated in a simple straight forward way.

Invariant subspaces and Jordan decomposition
For a n×n matrix M , using Jordan decomposition [15](Golub, p.354), we can find a n×n non-singular matrix U such that where the m j × m j matrix N j = λ j I j + τ j with j=1,2,...,k is the Jordan block with eigenvalue λ j , corresponding to the invariant subspace j of dimension m j in the n dimensional space of vector Z. I j is the identity matrix of dimension m j , while τ j is a superdiagonal matrix of dimension m j : (Notice the distinction: I and I j are identity matrixes for full space Z and the subspace respectively.) It is seen from these equations that when we study the nonlinear dynamics of the system, we can decompose the motion in the full space Z into the separate motion in many invariant subspaces. In particular, as we showed in Section 3 and 4, the motion in the eigenspace with eigenvalue e iµ provides a wealth of information about the dynamics. Actually, all other eigenvalues also provide this information. However, for the dynamics of the system of xy motion, we can concentrate on the two simplest invariant subspaces with eigenvalues e iµx , e iµy only, hence we drop the index j from Eq.(6.3) from now on.
The Jordan decomposition Eq.(6.1) appears to be complicated because the large dimension of the square matrix M . However as pointed out in Section 5, the left eigenspace for one eigenvalue can be separated in one simple step. If we label these left eigenvectors by e i , we find they satisfy the following equation (Appendix A): They form an invariant subspace, that is, after multiplied by (M − λI), any one of them remains to be a linear combination of them represented by the matrix t. We used the Einstein convention: the repeated k implies a sum over k. The index i and k run from 1 to m, where m is the null space dimension of the matrix (M − λI). Here t is the matrix derived in Appendix A.
For the example of 4 variables at 7th order, the eigenspace of eigenvalue λ = e iµx has dimension 10, and we can find the 10 generalized eigenvectors. t is a 10 × 10 upper triangular matrix. Hence our main issue is greatly simplified to finding the Jordan decomposition of t. This involves a much smaller amount of work when compared to finding the Jordan form of the matrix (M − λI) itself.
We observe that Eq.(6.4) is very similar in structure to the left eigenvector Eq.(6.3), except that the matrix t is not in Jordan form. As explained in Appendix B, it is easy to carry out Jordan decomposition for a uppertriangular matrix, particularly for the low dimension matrix t with all its diagonal elements zero.
Since all the diagonal elements are zero, once we find the Jordan decomposition g, so that t = g −1 τ g with τ in Jordan form, we find g hi e i (M − λI) = τ hn g ni e i . Now we can see that the new basis u h ≡ g hi e i satisfy the left eigenvector equation u h (M − λI) = τ hn u n . Since τ is in Jordan form with eigenvalue zero only, when we take u j as the rows of the matrix U , this equation is just Eq. (6.3).
Hence the eigenspace itself is again separated into several invariant subspaces, all of them have the same eigenvalue zero. About how to find these invariant subspaces, please see Appendix B. For the example mentioned above, the 10 dimensional invariant subspace is separated into 4 subspaces again, with dimension 4, 3, 2, 1 respectively, each is spanned by a chain of generalized eigenvectors. Thus we find the solution Eq. The structure of chains in the invariant subspace of one eigenvalue, with each chain corresponds to one Jordan block, serves as the basis of one of the method of Jordan decomposition. There are many programs available for Jordan decomposition, including some of them providing analytical solution. But occasionally the result is unstable. To ensure stable result, we adopt the method by Axel Ruhe (1970), Ruhe (1980a, 1980b) [16][17][18], which is referred to by the text book "Matrix Computations, 4th Edition" in page 402 of Golub [15]. For the convenience of the readers, in Appendix B, we outline the crucial steps of the method without the detailed derivation and the proof of the stability of the decomposition, which is given in these papers.
In order to study the dynamics of the system, as we shall show, the most important information is obtained from the Jordan decomposition of lnM rather than M itself. As explained in the Appendix C, we can take a logarithm of Eq. (6.1), then, we can carry out a second Jordan decomposition of lnN up to the same order easily and arrive at an equation similar to Eq.(6.1): Here we redefined the transformation matrix and Jordan form as U and N again to avoid cluttering of notations. After we replace Eq. (6.1) by Eq. (6.5), the formulas following Eq. (6.1) remain the same except the eigenvalue e iµ is replaced by iµ so that now N j = iµ j I j + τ j if we are interested in the Jordan block j with tune µ j = m x µ x + m y µ y .
Since we are mostly only interested in the analysis of the Jordan blocks with tune either µ x or µ y , we drop the index j in U j and similar to Eq. (6.3) we get U lnM = (iµI + τ )U (6.6) Notice now we use U and U to represent the submatrix of the transformation matrix. Use U U = I, we get U lnM U = iµI + τ, and U M U = e iµI+τ . For the example of 4 variables x, p x , y, p y , at 7th order, the subspace of eigenvalue e iµx is 4 for the longest chain, the matrix U is a 4 × 330 matrix, as we described in Section 1B. For the other 3 shorter chains, the dimension is 3, 2, 1 respectively. We will concentrate on studying the longest chain because it has most detailed information about the nonlinear dynamics. About why the longest chain is important, it will become clear at the end of Section 8 and Appendix E after we explain the structure of the chains.

Multi-turns, Tune and Amplitude Fluctuation
As described in Section 1B-1C, with the definition W = U Z and its initial W 0 = U Z 0 , within the region where the invariant tori remain stable, W 0 must satisfy the "coherence" condition, and Eq.(6.8) leads to W = e iµI+τ W 0 ∼ = e i(µ+φ) W 0 .
(7.1) with tune shift φ = −iw 1 /w 0 . That is, after each turn, every row w j = u j Z in W 0 rotates by a factor e i(µ+φ) in their separate complex planes like in a perturbed twist map (see, e.g., [1,8]), or, behaves like an action-angle variable. For example, let w 0 = re iθ , then r = |w 0 | remains the same like an action variable, while θ → θ + µ + φ after each turn like the angle variable. Near the border of the survival invariant tori, for example, if the system is near its dynamic aperture or near a major resonance, the condition of Eq. (1.17) and the condition that φ is real Eq. (1.18) are violated slightly, and the Eq. (1.14) also has increased errors. For convenience we call this situation as a deviation from a coherent state. Hence these conditions provide information about dynamic aperture and resonances.
We now consider the map after n turns. Applying Eq. (7.1) n times, we obtain W (n) = e inµI+nτ W 0 = e inµ e nτ W 0 , with W (n = 0) ≡ W 0 , and we have moved the constant e inµ to the front, dropped the identity matrix I to remind us that it is a constant. Before expanding Eq. (7.2), we follow the Dirac notation, let 3) Using the expression Eq. (6.2) for τ , we find chain τ |0 >= 0, τ |1 >= |0 >, τ |2 >= |1 >, τ |3 >= |2 >, . . . . And,  Compare this with the definition of W (n) , we find w 0 (n) = e inµ (w 0 + nw 1 + n 2 2 w 2 + . . . ) = e inµ w 0 (1 + n w 1 w 0 + n 2 2 Here we used φ and ∆ given in Eq.(1.17) and Eq. (1.19), To avoid cluttering of symbols, all w j without specification of n here represent w j (n = 0). When compare with Eq. (7.1), we identify Reφ as the amplitude dependent tune shift. In the region in the phase space where the invariant tori survive with stable frequency, we recognize that φ is real and remains to be a constant along a circle with radius r = |w 0 |. In addition, the term with ∆ in the exponent of Eq.(7.7) is proportional to n 2 instead of n. Hence it represents the fluctuation of frequency from turn to turn. ∆ should be nearly zero in order for the system to have a stable frequency. For convenience, this was referred to as coherence condition: We remark that we pay attention only to the first few terms in the exponent of the right hand side of Eq.(7.7) because the terms of higher power of n have factors of w m with increased m, while as m increases w m in Eq.(7.7) becomes small and lost information.
The analysis given by Eq. (7.7)-Eq. (7.8) paints a physical picture about why and how a chain represents a rotation in the phase space: we need not only w 0 to represent a rotation, we also need w 1 and w 2 to provide information about the phase advance and how stable the frequency is. The phase space is divided into many invariant subspaces, each represents a rotation through the phase shift generated by a chain. For each eigenvalue, the longest chain provides the most detailed information about the rotation while the shorter chains and their sub-chains represent approximation with only high power terms and less information. The invariant subspaces of different eigenvalues represent the rotation of different harmonics of the system.
Clearly w 0 , φ, and ∆ are all functions of initial value of z, z * . Or, if we use inverse function of w 0 (z) to represent z as function of w 0 , then φ and ∆ both are functions of w 0 . Thus, given initial value of w 0 , we can examine whether φ is constant along a circle with radius r = |w 0 |, whether it is a real function, and whether ∆ is nearly zero, and obtain the information about whether the initial state is close to the border of the survival invariant tori or near resonance. The deviation of the real part of φ from a constant is the tune fluctuation, while the imaginary part of φ gives amplitude fluctuation, i.e., the variation of r = |w 0 | after many turns. The non-zero ∆ indicates a deviation from coherent state, seems to be related to the Liapunov exponents [1]( p.298). All of these has been checked by a comparison with simulation, as we discussed in detail in Section 3 and 4.
However, all of these are based on the stability, precision, and uniqueness of the square matrix Jordan decomposition. As this is an issue often raised whenever one starts to talk about Jordan decomposition, we address it in the next section.

Stability, Precision, and Uniqueness of the Square Matrix Jordan Decomposition
There are still two issues remain to be addressed about the Jordan decomposition. The first is its stability and precision. The second is about its uniqueness. Both issues involve some details of Jordan decomposition. Hence in this section we only briefly explain how these two is-sues are resolved, the details are given in Appendix E.
First, about the stability and precision issue of Jordan decomposition, when we want to achieve high precision near the dynamic aperture, we need to use high order square matrix M . During the construction and Jordan decomposition of M , the coefficients of high power terms may become many orders of magnitude larger or smaller than the low order terms. When the ratio of these high power terms and the linear terms becomes so large that the small terms in the coefficients are approaching the machine precision, we lost information and cannot distinguish small terms near machine zero from true zero, then the Jordan decomposition breaks down. This problem is resolved by the scaling z =z/s mentioned in Section 3A, wherez is the Courant-Snyder variable. This scaling is used to reduce the ratio between the maximum coefficients of high power terms and linear terms. In the example given in the Appendix E, for a typical 7 order square matrix with 4 variables, the range of the coefficients is reduced from 18 orders of magnitude to between 0.03 and 35. As result the Jordan decomposition achieves stability and high precision.
Second, the Jordan decomposition we discussed so far is not uniquely determined. One obvious example of this is that when we add u 1 to u 0 in Eq.(2.6), the left eigenvector equation remains correct. We can check that u 0 (M − iω 0 I) = u 1 ; u 1 (M − iω 0 I) = 0, so they satisfy the left eigenvector equation U (M − iω 0 I) = τ U . We immediately see that when u 0 is replaced by u 0 + au 1 with any arbitrary number a, the left eigenvector equation still holds. As we can see in Section 2, this corresponds to add to w 0 a term proportional to w 1 .
The polynomial w 1 represents a circle in the complex plane of z, it does not carry any information about the nonlinear distortion, while w 0 has all the terms of order 1 to 3 and carries much information about the distortion. If |a| is very large, the nonlinear distortion we see in w 0 will be dominated by aw 1 , which carries no information about the nonlinear distortion. This blurs the nonlinear distortion details given by other terms in w 0 . Hence we need to choose the coefficient a so that this invariant term is zero in u 0 . As shown in Eq.(2.6), this term, i.e., the column 7 of the first row of U is indeed already 0.
In Appendix E, we study how to remove from w 0x the high power terms of form z x (z x z * x ) mx (z y z * y ) my , i.e., the terms of form of z x times an invariant monomial. For the case of 4 variables at 7 order, we find in the longest chain, w 0x = u 0x Z is a polynomial with power from 1 to 7, w 1x = u 1x Z has power from 3 to 7, ..., and w 3x = u 3x Z has only terms of power 7. Table II of the Appendix E shows the structure of of different chains by showing the terms of the lowest power. From the table, we see that the only polynomial with power from 1 to 7 is w 0x in the longest chain. All the other w jx do not have linear term. We also show why those high power invariant terms destroy the information about nonlinear distortion. The contribution from these terms can always be systematically removed. In doing so, the Jordan decomposition becomes uniquely defined, and the blur caused by these terms is minimized.

Conclusions and Future Work
We showed that for a nonlinear dynamic system, we can construct a square matrix. Using linear algebra the Jordan decomposition of the square matrix leads to a variable transformation from z to w 0 . The one turn map expressed in terms of the new variable w 0 is exactly equivalent to the original map expressed in terms of z whenever the inverse function z(w 0 ) exists. However, the map expressed in terms of w 0 has only small deviation from a twist map, i.e, it is a perturbed twist map. The deviation is characterized by the fact that on a circle of constant |w 0 | in the w 0 plane, φ is not a constant, and ∆ = 0; Im(φ) = 0. Hence these quantities provide a tool to study the frequency fluctuation, the stability of the trajectories, and dynamic aperture. Thus the analysis of nonlinear dynamic system can be greatly simplified using linear algebra.
The main feature of the new method is we can achieve high order in one step. This is a significant advantage when compared with canonical perturbation theory and normal form where the calculation is carried out order to order by complicated iteration. We also showed that the stability and precision of the Jordan decomposition is ensured by scaling the variables, and by removing the high power invariant monomial terms.
In Section 3 and 4, we demonstrated that the actionangle variable remains nearly constant up to near the boundary of the dynamic aperture and resonance lines. They successfully reproduce both the correct phase space structure and tune shift with amplitude. In addition, we tested the several measures of the stability of the trajectories and their tunes such as the criterions Imφ ∼ = 0; ∆ ∼ = 0. For sufficient high order, we compared the region where these criterions satisfied with the dynamic aperture for realistic lattices in Section 3, showing these measures can be used to find the dynamic aperture.
In summary, the presented theory shows a good potential in theoretical understanding of complicated dynamical system to guide the optimization of dynamical apertures. Using analysis of one turn map to narrow down the searching range of parameter space before the final confirmation by tracking, the new method can significantly speed up the optimization.
The examples given here are limited to 2 to 4 variables. However, the method developed here is general. Hence the application to 6 variables or more should be possible. The inclusion of bunch length and energy spread into consideration of this method may be interesting for high energy accelerator physics. The analysis given here is general, and hopefully may be applied to other areas, for example, nonlinear dynamics in physics and astronomy.

Acknowledgments
The author would like to thank Dr. Yongjun Li for his many comments, suggestions and discussion on this paper, in particular, for his contribution to the optimization result used in Section 4. We also would like to thank Dr. Lingyun Yang for his support on the use of his program of TPSA to construct the square matrixes. We also would like to thank Dr. Yue Hao for discussion and comments on the manuscript, and for providing TPSA programs to construct the square matrixes. We also thank Dr. G. Stupakov for a discussion on applying the method to nonlinear differential equation.
We also thank Dr. Boaz Nash for the collaboration on the start and early development in the direction of square matrix analysis of nonlinear dynamics.
This research used resources of the National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. de-sc0012704. As pointed out in Section 5, for triangular matrix, the generalized eigenvectors can be calculated in a simple straight forward way. For a specific eigenvalue, the eigenvectors span an invariant subspace. Here we give a brief description of the method to solve for a set of basis for this subspace using an example.

List of Appendixes
For an eigenvalue λ, we need to find the non-trivial solutions of the equation M Z = λZ. As an example, we study the eigenspace of the matrix M of Eq. (1.5) for λ = e iµ .
Let m ≡ M − λI with I the identity matrix. Given the set of eigenvalues of M {1, e iµ , e −iµ , e 2iµ , 1, e −2iµ , e 3iµ , e iµ , e −iµ , e −3iµ }, we see that m is a triangular matrix with its 2nd and 8th diagonal element equal to zero, the other 8 diagonal elements are non-zeros. Hence there are only two eigenvectors which span the invariant subspace for this eigenvalue.
To save space for this paper and avoid writing large matrix, let us assume that the 2nd and the 5th diagonal element are zero instead of 2nd and 8th. Thus to find the first eigenvector, we write Here we have chosen X 1 only has first 2 rows nonzero, thus clearly we only need to choose to satisfy the first row of the matrix equation (we use * to represent a certain number which there is no need to specify). That is, we have ax + b = 0, and hence x = −b/a. Thus we have solved for the first eigenvector X 1 .
Next we find the 2nd eigenvector X 2 . Since the 2nd zero diagonal element is the 5th, we let only the first 5 rows of X 2 to be nonzero, and let 2) t 21 is a certain number to be determined. Again we let the 5th row of X 2 to be 1. Clearly we only need to find the first 4 rows of X 2 to satisfy the equation. The 4th row is cx 4 + * = 0 , hence x 4 = − * /c. This process is repeated to find x 3 . When we proceed to the 2nd row, the diagonal element becomes zero, hence the situation is different. And we find the equation * x 3 + * x 4 + * = t 21 , where x 2 is absent and can be set to zero. Since x 3 and x 4 are already determined, this equation now is used to determine t 21 . This process also explains why in Eq. (A.2) we cannot set t 21 = 0 and hence X 2 is not a proper eigenvector, but a generalized eigenvector: when it satisfies Eq. (A.2), we have mX 2 = 0 , but m 2 X 2 = t 21 mX 1 = 0.
Once t 21 is determined, since x 2 through x 4 are already determined, we can proceed to the first row to solve for x 1 : ax 1 + * x 2 + * x 3 + * x 4 + * = t 21 * because a is nonzero.
All the rows in X 1 ,X 2 are rational functions of elements in the matrix M , this is general.
We can generalize this procedure to the case with more than two zero diagonal elements in m. Without further details, we summarize the result as follows. As long as the system is sufficiently far away from resonance, for example, the minimum value of |λ − e inµ | for all n within the specified order is several orders of magnitude larger than machine zero, there are well-defined zeros in m matrix. Let the number of zeros to be n λ , we can find a set of n λ (generalized) eigenvectors X j such that mX j = Σ 0<k<j t jk X k with 1 ≤ j ≤ n λ . Because mX j remains to be a linear combination of X k , these X j s serve as basis for the invariant subspace of eigenvalue λ of dimension n λ . Using the Einstein convention (the repeated k implies a sum over k), we have mX j = t jk X k , where t is a lower triangular matrix with all diagonal elements equal to zero.
The example here is for the right eigenvectors. To calculate left eigenvectors, we can simply transpose the matrix M and find its right eigenvectors as columns as discussed here, and transpose them back to rows.
The result is the set of left eigenvectors e i such that e i (M − λI) = t ik e k , as given in Eq. (6.4).

Appendix B: Outline of a Method of Jordan Decomposition
As pointed out in Section 6, the structure of chains in the invariant subspace of one eigenvalue, with each chain corresponds to one Jordan block, serves as the basis of the method of Jordan decomposition we outline here. Our main issue has been reduced to the Jordan decomposition of the subspace of matrix m in Appendix A, which is represented by the much lower dimension matrix t with eigenvalue zero.
As pointed out in Section 3, this subspace is spanned by vectors of several chains. Each chain has a proper eigenvector at its end. These proper eigenvectors form the null space of m . Hence the dimension of the null space of m is equal to the number of chains. In the example following Eq. (6.4), because there are 4 chains with lengths 4,3,2,1 respectively, when multiplied by m, the 4 proper eigenvectors become zero, hence the null space N 1 for m has dimension 4. The chain of length 1 is removed by m, so after multiplied by m only 3 chains left, and the null space N 2 for m 2 has dimension 4+3=7. Continue this way we find the dimension of the null space N k for m k is m k =4, 7, 9, 10 for k=1, 2, 3, 4 respectively. Thus if for every k we can find the basis of null space N k for m k , we can identify all the eigenvectors as follows. Since m 4 =10, m 3 =9 means there is one vector in N 4 independent from the basis of N 3 , if we can find this vector u which satisfies m 4 u = 0 but m 3 u = 0, we identify this as the first generalized eigenvector in the longest chain because this is the last remaining vector to become zero when we apply the matrix m to all the basis in the subspace in succession. Thus u, mu, m 2 u, m 3 u are the basis of the longest chain of length 4. Once we separate these 4 eigenvectors from the subspace of dimension 10, and find the remaining 6 independent 6 vectors, we can repeat this process to find the vector u such that m 3 u = 0 but m 2 u = 0 and then find and separate the chain of length 3. Clearly this process can be continued until all (generalized) eigenvectors are separated, thus the Jordan basis is solved. If there are two chains with same length, before the last operation of m to nullify the full subspace, there will be two independent vectors left. We can choose any of them to form a chain, and the another to form another chain. So the process described here is general.
Thus the Jordan decomposition in this method requires us to find the null space of the powers of m , and also requires us to separate independent vectors which are in one null space but not in the other. Since the t matrix we would like to decompose into Jordan form, described in Section 3 and in Appendix A, is a triangular matrix with all diagonal element equal to zero and with low dimension, there seems to be a simpler way to carry out the Jordan decomposition than what we shall outline in the following. However, before we can systematically find this simpler way, we just take the method given by Käström and Ruhe [16][17][18], where the steps we mentioned above are carried out by repeated application of singular value decomposition (SVD). We will give a very brief outline of the steps by an example.
Find the dimension of the null spaces of the powers of a matrix and its chain structure Let us assume a matrix The dimension m k of the null space of A k can be found by finding the rank of A k using SVD because the dimension of the matrix, subtracted by the number of zero singular value, is equal to its rank. In numerical calculation, we have to specify a lower limit of singular value which is taken to be zero. If our system is such that the minimum nonzero singular value is many orders of magnitude larger than machine zero, this can be carried without ambiguity. As long as the system is not exactly on resonance, this is easily satisfied. By SVD we find the Table I, where n k = m k − m k−1 is the number of chains left after multiplying by A k−1 . From this table, we can see that there are 2 chains of length 2 and 4.

B.2.
Find the null space N k of A k This is carried out by SVD as follows. By SVD we have where H represents Hermitian conjugate. If we choose SVD such that the singular values are arranged in increasing order, then the first 2 (see Table I:m 1 = n 1 = 2) singular values are zeros. Then following [16][17][18], we define , which has all zero as its first m 1 = 2 columns, corresponding to the null space N 1 . To find the null space N 2 , we repeat this procedure for the (n − m 1 ) × (n − m 1 ) = (6 − 2) ×(6 − 2) = 4 × 4 submatrix A 2 , which is the lower right corner of A (2) , as shown in Fig.14.
The sequence of unitary transform V k like this leads to a set of submatrixes A k each is at the lower right corner of the previous one. The result is a unitary transform Here all the zeros represent blocks of zeros with the left column represent m 1 columns of zeros. B ij is a n i × n j submatrix. According to the table I, the widths of the blocks are 2,2,1,1. Indeed our calculation result agrees with this form. The next step is to carry out unitary transform such that each of B k,k+1 blocks is transformed to uppertriangular form. Let ) be unitary matrixes with only diagonal blocks nonzero, and the dimension of blocks are n k as in Table I. Let T = U BU H . Then we find = R is uppertriangular. We then proceed to T 23 = U 2 B 23 U H 3 . Now since U 3 is known, we can carry out another QR decomposition B 23 U H 3 = QR and choose U 2 = Q H so that T 23 = R. Here to avoid cluttering of notation we have repeated the use of the notation Q and R for different matrixes by redefining them each time we use QR decomposition. This procedure continues until U 1 is determined to make T 12 upper-trangular. Then T has the form in Fig.15 (left), with only upper-triangular blocks nonzero and also with all submatrix T k,k+1 upper-triangular.
For the example, the result agrees with this form: simplified and the chain structure of the subspace will be revealed, as will become clear in the next subsection.

B.4. Gauss Elimination by Similar Transformation
We now proceed to eliminate those elements in T represented by crosses in Fig.15 by Gauss elimination. The procedure is to eliminate all elements other than coupling elements column by column, start from right to left. Consider the similar transform of T we find ij = a, and the transform simply add the column i multiplied by a to column j, and subtract the row j multiplied by a from row i. Thus if we start from i=4, j=5, and a = T 46 /T 56 , because the T matrix is upper triangular, T 46 is eliminated. Column 5 is affected during this transformation, but this does not prevent our elimination process. In particular, the transformed matrix still remains to have the form of the T matrix. When we repeat this procedure with j=5 but let i=3,2,1, the column 6 is eliminated except the coupling element T 56 .
Next we proceed to column 5, but start from i=3, j=4. This eliminates column 5 except the coupling element T 45 . Then, from right to left, the procedure continues to column 4, 3, 2 in the same way. The result is a matrix J in the form shown in Fig.15(right), with only the coupling elements r ij nonzero.
For the example discussed from section B1 to B4, we find This confirms the table in B1 that there are two chains of lengths 4 and 2. To transform to standard Jordan form with all the coupling elements equal to 1, we redefine the norm of the basis e j , and reorder them according to the chain structure, which amounts to a transformation of renormalization followed by a permutation, and we neglect the details here. The result is the Jordan form (we redefine U here to include the renormalization and permutation, and choose to have the longest chain first).
For one block with eigenvalue λ j = e iµj we have U j lnM U j = ln(e iµj I j + τ j ) = ln(e iµj (I j + e −iµj τ j )). Because we are only interested in this invariant subspace, we drop the index j when it is not needed, to write Here now U represents U j , i.e., we redefine its submatrix using the same notation to simplify writing. If the dimension of the subspace j is n j , the series terminates at n j − 1 because τ nj = 0. The right hand side is not in Jordan form, but it is very simple, and can be transformed by another matrix V (we are excused to redefine the notation V here, not to be confused with the V matrix used in Appendix B4, or Appendix D) into Jordan form V U lnM U V −1 = iµI + τ . Now we redefine V U as U , U V −1 as U , get Notice that since we concentrate only on the subspace of the longest chain in the eigenspace e iµj , compared with the discussion in Appendix B, the Jordan decomposition process by V here involves only one chain only.

Appendix D: Inverse of an Upper-Triangular Matrix
The variable transformation of w 0x (z x , z y ), w 0y (z x , z y ) is defined by the first row w 0x = u 0x Z, w 0y = u 0y Z of Eq.(1.11). As explained in Section 3B, very often we need to find the inverse function z x (w x , w y ), z y (w x , w y ) by solving the equation For this purpose we construct a column Φ with the rows given by w x , w y , w * x , w * y and the monomials constructed from them so its transposition Φ is similar to the row defined by Eq. (1.4). Now instead of Eq. (1.3), we can construct a square matrix V where V is also a triangular matrix. It is well-known that the inverse matrix of a triangular matrix is easy to calculate as long as its diagonal elements have no zeros. The linear part of the polynomial is very simple because as z x , z y approach zero, they are proportional to w x ,w y . We can always choose to multiply U by a constant and divide U by the same constant (there are 2 transformation matrixes U for x and y separately but here we are not specific about this point) so that w x ,w y approach z x , z y respectively as they approach zero. Thus from Section 1A we find that V has all its diagonal elements equal to 1. Thus it is easy to calculate the inverse matrix V −1 which can now be used to calculate z x , z y approximately when w x ,w y are given. The result can be used as a set of initial trial values for a more accurate solution of the inverse function of Eq.(D.1). In most our applications the triangular inverse matrix V −1 gives an excellent solution already, and there is no need to further improve the precision by solving the inverse function more precisely. Suppose we want to find the inverse of an uppertriangular matrix V of dimension n. In order to have inverse, all the diagonal elements of V are nonzero. Let us find a matrix L consists of column y k L = (y 1 , y 2 , y 3 , ...y k , ..., y n ) (D.3) And we first find y k such that where e k has all rows zero except the k'th row equal to 1. Solving the equation starting from the last row, we find x n = x n−1 = ... = 0 until we reach the k'th row, where we have ax k = 1. Hence, x k = 1/a. Then we can find x k−1 and continue up to x 1 . Thus we find y k has only nonzero rows above and including k'th row. Once we find all the y k , clearly we have the inverse matrix V −1 = L = (y 1 , y 2 , y 3 , ...y k , ..., y n ) (D.5) as an upper-triangular matrix too. Very often, Jordan decomposition of a matrix is considered to be ill conditioned. However, due to the upper triangular property and because its eigenvalues are precisely known on the unit circle, the matrix M is already in the form of a stable Schur decomposition [15], (Schur decomposition is the step to find the eigenvalues and triagularize the matrix), and its Jordan decomposition is stable except when we are very close to resonance where some eigenvalues are nearly degenerate. When we are sufficiently far from resonance, all the eigenvalues of the matrix (M − λI) other than zeros are sufficiently far from zero so the null space, i.e., the invariant subspace with eigenvalue λ can be solved for its eigenvectors to very high precision.
However, we may lose precision during the Jordan decomposition of the matrix t in the invariant subspace as given in Eq. (6.4) when we use high order square matrixes. Eq. (6.4) tells us that when (M − λI) acts on the vectors in the invariant subspace of eigenvalue λ from right, it is equivalent to the much lower dimension matrix t acts on the vector from the left. Hence in the following discussion we speak of them (t or (M − λI)) as if we were talking about the same thing.
The procedure of Jordan decomposition is based on the fact that the invariant subspace is spanned by several chains of eigenvectors, and the end of each chain contributes to the null space of the matrix: u m−1 (M −λI) = 0, as explained in Section 6 and in the second paragraph of Appendix B in particular. Using Eq. (6.4), we see that the last right eigenvector (the proper right eigenvector) of t for each chain contributes to the null space of t. When the null spaces of different powers of t are calculated, it then becomes easy to build the vectors in each chain and establish the Jordan decomposition, as explained in Appendix B. More details about this procedure are given in Appendix B, and we refer to references [16][17][18].
Here we only point out that our method of Jordan decomposition is based on finding the null spaces of different powers of t. To find the null space correctly we need to distinguish small singular values from zero using singular value decomposition. The high power of t may have very small singular values almost reaching the machine precision. In this case we cannot correctly separate the null space anymore, and the Jordan decomposition fails. Hence we need narrow the range of the singular values so that we can distinguish the minimum singular value from zero clearly.
We found that the range of singular values depends on the range of the absolute value of the coefficients of monomial terms at different orders in the square matrix (M − λI). By scaling the variables z x = x − ip x , z y = y − ip y by a factor s, we can reduce the ratio of the maximum and the minimum absolute value of the coefficients above zero. For example, we found for a specific lattice the square matrix has its maximum of the absolute value of the coefficients at 7th order as 1.38 × 10 18 , and the minimum of those other than zeros is found to be the first order as 1 (the coefficients of order of machine zero are excluded). Then let s 7 × 1.38 × 10 18 = 1 × s, we find s = (1.38 × 10 18 ) − 1 6 = 0.00115. Now we let sz x = x − ip x , sz y = y − ip y to make the coefficients of these two terms equal, then we find that the range of the coefficients in the new square matrix using new variables has the coefficients span much smaller range. And the range of singular values is reduced from 18 orders of magnitude to between 0.03 and 35 after the scaling. In Fig.16 we show the spectrum of the singular values before (red) and after the scaling (blue). Therefore, the null space of the invariant subspace is clearly identified; the Jordan decomposition is very stable and accurate.
FIG. 16: The range of singular value of the square matrix before (red) and after scaling (blue) Another possible loss of precision comes from the construction of the square matrix because the coefficients of the high power terms in the Taylor series expansion in equations similar to Eq. (1.1) are either larger or smaller by many orders of magnitudes than the linear terms. Thus when we sum over these terms, the small terms lost precision (the effective number of digits) because the limitation on the number of decimal points. This problem can also be solved by scaling method same as mentioned above, the only difference is instead of consider the coefficients of the square matrix, we consider the coefficients in the Taylor expansion in Eq. (1.1). The result not only reduced the error of the square matrix, it also reduces the range of the singular values of the matrix (M − λI). The improvement of the scaling before the construction of the square matrix is so significant that in all the cases we studied, there is very little improvement from the second scaling based on the coefficients of the full square matrix. However, since the second scaling is very simple, and does provide some improvement over the maximum-minimum singular value ratio, we always carry out a second scaling.
When on resonance, the eigenvalue e iµx , e iµy is degenerate to the eigenvalue of other harmonic of the tune, in the form of e i(mµx+nµy) . The structure of the Jordan block is different, and the resonance issue should be treated separately, and is not discussed in this paper. However, as our numerical examples show, in Section 3, our present analysis of nonlinear dynamics by square ma-trix method is valid even when it is quite close to a resonance.

E.2. Non-uniqueness of Jordan decomposition
Another issue of Jordan decomposition is whether it is not unique, and whether we need to impose other conditions based on physics to make it unique. The nonuniqueness of Jordan decomposition is due to its chain structure. A simple example is a chain of two vectors u 0 and u 1 , with M u 0 = u 1 and M u 1 = 0. Then it is obvious that another two vectors u 0 = u 0 + au 1 and u 1 = u 1 also satisfy the same chain relation: with M u 0 = u 1 and M u 1 = 0, where a is an arbitrary constant. Hence the basis of Jordan decomposition is not uniquely defined, and the question now is what kind of conditions based on physics will allow us to determine the choice of a. To solve this problem, we must first understand the specific structure of the chains. In the case of two variables x and p, we find only one chain for one eigenvalue e iµ . For example, for 7th order, the invariant subspace we found has 4 (generalized) eigenvectors u 0 , u 1 , u 2 , u 3 as the basis of the space. We find that w 0 = u 0 Z is a polynomial with powers from 1 to 7, w 1 = u 1 Z has powers from 3 to 7, w 2 = u 2 Z has powers from 5 to 7, w 3 = u 3 Z has only a term of power 7, which is proportional to z(zz * ) 3 . When the matrix (lnM − iµI) operates from right on the invariant subspace of Z spanned by u 0 , u 1 , u 2 , u 3 , its operation in the representation using u 0 , u 1 , u 2 , u 3 as basis, is given by the simple matrix τ (see Eq. (6.6)). Using the equations of left eigenvectors U (lnM − iµI) = τ U , we have u 0 (lnM − iµI) = u 1 , u 1 (lnM − iµI) = u 2 , . . . , u 3 (lnM − iµI) = 0, i.e., they form a chain. Intuitively, the following helps to understand why when multiplied by the matrix (lnM −iµI), the lowest power terms of the eigenvectors increase progressively by power of 2: the off-diagonal terms in the matrix has at least power of 2.
This is for two variables x and p. For 4 variables such as x, p x , y, p y , as in the case we studied for a storage ring, there are more independent chains than one for the invariant subspaces of both eigenvalues e iµx , e iµy . For example, at 7th order, for the eigenvalue e iµx , in addition to the chain u 0 , u 1 , u 2 , u 3 , there is a linear independent chain u 0 , u 1 , u 2 , another chain u 0 , u 1 and a chain with only one element u 0 , forming an invariant subspace of dimension 10. There is only one lowest power term in u 0 : z x = x − ip x . We list the lowest power terms of the chains in this example in Table II. Thus the pattern appears: the lowest power terms in the first vector of a chain or sub-chain are always terms with z x times the powers of the invariant monomial z x z * x or z y z * y so that for small amplitude, it represents a simple rotation as z x . The first vectors of the shorter chains have terms with z x multiplied by higher powers of these invariants. This feature helps us to understand its effects on the non-uniqueness of the Jordan decomposition.

E.4. Physical meaning of non-uniqueness
To understand the meaning of the non-uniqueness, let us assume w 0 = z + z 3 = z(1 + z 2 ), w 1 = z(zz * ). That is the simplest case where the end of the chain is z multiplied by an invariant factor of zz * . When |z| << 1, for a circular motion in z-plane, both w 0 and w 1 correspond to a circular motion in w-plane. But as z increases, |w 0 | no longer remains constant because as the phase of z 2 changes the factor 1 + z 2 has interference between its two terms. This modulation of amplitude gives distortion of the trajectory, because a constant |w 0 | does not correspond to a circular trajectory in z-plane any more. This means w 0 carries information about the distortion of the trajectory, while w 1 does not carry this information. When w 1 is multiplied by a large number and added to w 0 , it dominates over w 0 , and the information about distortion lost.
It is clear now that those terms of z times the powers of the invariant of form zz * represent pure circular motion in phase space and they do not present any information about nonlinearity. Therefore when they are mixed into the first vector of the longest chain, they blur the distortion generated by the interference between the linear term and terms of other harmonics such as z 3 , and the result is the non-uniqueness. E.5. The need to minimize high power terms from the first vector of the longest chain In Table II, terms such as z 3 x can appear in u 0 but it is not in the lowest power terms in u 0 because if it is then it will create a tune 3µ x rather than µ x even for small amplitude. These terms such as z 3 x interferes with the dominating first order term z x in u 0 to generate distortion. A mixture of long chain with short chains in Table II such as u (1) 0 ≡ u 0 + au 1 + bu 0 , u (1) 1 ≡ u 1 + au 2 + bu 1 , u (1) 2 ≡ u 2 +au 3 +bu 2 , u (1) over by other high power terms and the trajectory becomes more close to a circle with lost information. Hence we need to choose a or b to minimize the high power terms. It is obvious that a polynomial with many terms should be able to describe much more detailed complicated curve or surface than a single monomial. Also, it is easy to understand that we need to minimize high power terms to increase the convergence radius of a Taylor series.
Therefore, it becomes clear, to extract more detailed information from u 0 we need to find a way to minimize higher power terms while maintain a chain satisfying the left eigenvector relation in the invariant subspace. Thus in the Appendix F, we study and find the linear combination of sub-chains such that the higher power terms are minimized. For example, the sub-chain u 1 , u 2 , u 3 is used to minimize the 3rd power terms in u 0 , the sub-chain u 2 , u 3 is used to minimize the 5th power terms in u 0 , and the sub-chain u 3 is used to minimize the 7th power terms in u 0 .
We remark here it is observed that in the Table II the number of chains for each order happen to be equal to the number of lowest power terms, so that the linear combination of these chains can be used to remove these terms from u 0 completely. Hence when all these higher power terms are minimized, it amounts to removing all the terms of form of z x times the powers of the invariants, this is equivalent to separate the shorter chains from the longest chain. The linear combination is uniquely determined, and hence the Jordan decomposition is stable, accurate, and unique. The procedure is described in Appendix F.
The terms we discussed here are among the nonlinear driving terms, the number of which is much more than we list in the table. With limited number of sextupoles, it is impossible to make them all zero, so they can only be minimized in the process of optimization of the dynamic aperture. However, in our description of the chain structure, these terms are all connected in the longest chain. In other words they are correlated. Hence the required number of parameters to be varied is largely reduced.

Appendix F: Minimize Higher Power Terms
As explained in Section 8 and Appendix E, we would like to minimize the higher power terms in the first vector of the longest chain by adding linear combination of subchains to the longest chain. This is a change of basis in the invariant subspace of a specific eigenvalue, i.e., a linear coordinate transformation. We use an example to illustrate the transformation.
Let us assume the matrix N in Eq.(6.1) has two chains of length 4 and 3. We study a similar transformation t which changes the basis of the eigenspace but keeps the Jordan matrix N invariant. Thus we write where A and B are of form Eq.(6.2) with only superdiagonal elements equal to 1 and all other elements equal to zero. A and I 1 have dimension 4 , B and I 2 have dimension 3 respectively. The matrix t is in a general form to represent a change of basis so that the longest chain after the transformation becomes a linear combination of the long chain from the 4×4 matrix T 1 and the short chain from the 4×3 matrix T 2 . In Appendix E, we mentioned that the Jordan form is not unique. But from Eq. (F.1) we can obtain all the generally possible Jordan basis for the longest chain. Our goal is to choose T 1 and T 2 to minimize the higher power terms in the first vector in the basis for matrix N , as we explained in Appendix E. Eq. (F.1) leads to N t = tN , thus we have i.e., AT 1 = T 1 A and T 2 B = AT 2 . To determine the form of the matrix T , we examine the effect of Jordan form on column and row. Let Hence when acted from left by the Jordan matrix, the column shifts up, leaving the last row zero; when acted from right, a row shifts to the right, leaving the left column zero. Thus AT 1 = T 1 A means the matrix T 1 after shifted up should be the same as it is shifted to the right. Examine this pattern, we see that T 1 must be upper triangular, and all the elements on the same superdiagonal must be equal to each other. Also, T 2 is of this form. So let