Bayesian Approach for Linear Optics Correction

With a Bayesian approach, the linear optics correction algorithm for storage rings is revisited. Starting from the Bayes' theorem, a complete linear optics model is simplified as"likelihood functions"and"prior probability distributions". Under some assumptions, the least square algorithm and then the Jacobian matrix approach can be re-derived. The coherence of the correction algorithm is ensured through specifying a self-consistent regularization coefficient to prevent overfitting. Optimal weights for different correction objectives are obtained based on their measurement noise level. A new technique has been developed to resolve degenerated quadrupole errors when observed at a few select BPMs. A necessary condition of being distinguishable is that their optics response vectors seen at these specific BPMs should be near-orthogonal.


I. INTRODUCTION
At modern particle accelerator facilities, advanced beam diagnostics instruments with high acquisition rate can generate copious amounts of data within a short time period. The data can be used in a wide variety of applications such as characterizing machine parameters, monitoring machine performance, realizing real-time correction, feedbacks, etc. A specific example would be obtaining beam turn-by-turn (TbT) data from beam position monitors (BPM) after the beam is disturbed. The linear optics functions, such as, the envelope function β of betatron oscillation and its phase φ [1], can be extracted [2][3][4]. Due to various measurement noise, accurately identifying quadrupole error sources is important for optics correction. One can average over repetitive measurements, then use the mean values directly. Distributions of measurement noise, which are usually ignored, however, can provide rich information for identifying error sources precisely. Using a Bayesian approach and the information provided by the error analysis, the linear optics correction problem presented by accelerators can be approached from the viewpoint of probability.
Lattice measurement noise and quadrupole excitations errors are usually randomly distributed around their expectation values. Overfitting quadrupole errors must be avoided. Specifically, the optics functions β and φ can be measured with BPMs at many locations s i , where i = 0, 1, · · · , N − 1, and N is the total number of BPMs. Given a set of measured data with noise, fitting the actual quadrupole errors ∆K, is a typical nonlinear regression problem since the dependence of β and φ on K is nonlinear. In regression problems, overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points [5,6]. There are two reasons of revisiting this problem with a Bayesian approach. First, the Bayesian approach is a proven technique in preventing overfitting. Second, several optics distortions caused by quadrupole errors need to be corrected simultaneously, but measured in different units and scales. With the Bayesian approach, the coherence of the correction algorithm, which is capable of dealing with multi-objective regression problems, can be established.
In some scenarios, an optics distortion pattern is indeed caused by a single quadrupole error rather than normally distributed errors. However, the goal of the Bayesian approach is to distribute the error to multiple sources. It can sometimes fail to distinguish the single source from its highly degenerated neighbors. A new technique has been developed where only a few specific BPMs are selected to address the degeneracy. One necessary condition for being distinguishable is that the optics response vectors of those specific BPMs should be nearorthogonal.
To further explain this approach, the remaining sections are outlined as follows: Sect. II briefly review the well-known correction scheme, i.e., use of Jacobian matrix as well as a discussion of the difficulties of this method. In Sect. III, we start from the Bayesian theorem to re-derive the least square algorithm. It will become clear that using Jacobian matrix is a simplified version of Bayesian approach with a flat prior probability. Sect. IV introduces a new technique for resolving the degeneracy of neighboring quadrupoles. Both simulation and experimental data taken from the National Synchrotron Light Source II (NSLS-II) storage ring are used to illustrate this technique. A brief summary is given in Sect. V.

II. JACOBIAN MATRIX APPROACH
A linear optics model of a storage ring can be represented by a set of s-dependent optics functions. For simplicity's sake, the envelope function β(s) is used as an example. Other lattice functions such as betatron phase, φ(s) will be covered later. Given a fixed magnetic lattice layout, an optics model reads as here K is a vector composed of all normalized quadrupole focusing strengths, and s is the longitudinal coordinate. Bold symbols, such as "X", are used to denote vectors and matrices throughout this paper. The design lattice model is represented as where K 0 represents the quadrupoles' nominal setting, β 0 is the nominal envelope function along s. If quadrupoles have some errors ∆K on top of K 0 , the linear lattice is distorted as ∆β is often referred to as β-beat. Given the total distribution of N BPMs, a ring's optics model can be simply represented as an N -dimensional vector. Once the linear optics is measured at these BPMs, the lattice correction needs to identify (account for) the actual quadrupole errors. A well-known and straightforward correction method is to use the linear dependence of the β-function on each quadrupole, i.e., its Jacobian matrix [4], where s i is the i th BPM's longitudinal location, and j is the index of quadrupoles. M can be constructed based on either a design model or a beam-based measurement. By solving the following equation the error sources ∆K can be identified approximately.
Here ∆β is a vector of the β-beat seen at the BPMs. Since the dependence isn't linear, iteratively applying Eq. (5) is needed. It turns out that, due to measurement noise, highly degenerated quadrupoles and even bad BPMs, the solution to Eq. (5) is spoiled by overfitting. It is wellknown that overfitting can be prevented by a regularization technique [5][6][7]. But how to choose a reasonable regularization coefficient to cut off measurement noise isn't obvious here. At the same time, multiple optics functions are measured, but in different scales and units. We can stack their response matrices vertically with some weights. However, the strategy for specifying appropriate weights isn't clear. An inappropriate regularization and weight specification could degrade the performance of correction. For example, some functions are overfitted and can sacrifice others. Another concern is that the quadrupole errors obtained with Eq. (5) usually reproduce the optics distortions at the location of the BPMs rather than the whole ring. This might be a critical issue for collider rings, in which no BPM can sit exactly at their interaction points (IP). Minimizing the β-beat at IP's neighboring BPMs doesn't ensure that the optics at IPs are optimized. Therefore, it is important to precisely identify errors in order to correct lattice distortion globally, rather than limited to the locations of BPMs. Some of these difficulties can be mitigated by using a complete accelerator model instead of its Jacobian matrix. Another method is to validate the obtained quadrupole errors at a few testing BPMs, which are intentionally left out from the fitting. However, iterative fitting and validation with a complete optics model is time-consuming, especially for large scale storage rings. In the next section, the lattice correction problem will be addressed using the Bayesian approach.

III. BAYESIAN APPROACH
From the viewpoint of probability, identifying quadrupole errors from repetitive and independent measurements can be achieved by computing a posterior conditional probability distribution and determining its maxima. Consider a simple case of β function in the horizontal plane. Based on the Bayes' theorem, the conditional probability of having an error ∆K with a measured β reads as [5,8] Eq. (6) can be interpreted as, given a measured optics distortion β = β 0 + ∆β, the probability of it being the error source of ∆K is proportional to the product of a likelihood function p(β|∆K) and a probability distribution of error ∆K. The likelihood function can be recognized as being related to the dependence of β on K, i.e., the Jacobian matrix of Eq. (4). p(∆K) is known as prior probability distribution which will be covered in greater detail later. p(β) is the normalizing constant. By maximizing the probability in Eq. (6), the most likely quadrupole error distribution can be obtained. In general, we can assume that both β measurement noise and quadrupole excitation errors are normally distributed. For example, at a particular BPM, repetitive measurement of βs are distributed around an expectation value E(β) =β with a variance σ β .
Eq. (6) thus can be re-written as Maximizing the probability of Eq. (8) is equivalent to minimizing its negative logarithm, Here • is the Euclidean norm of a vector. Eq. (9) can be recognized as the least-square algorithm but with some well-defined weights. It is important to note that, since we assume a normal distribution for quadrupole errors, the solution to Eq. (9) is intended to allocate errors according to a normal distribution, even if they are not. If there is a systematic calibration error on the quadrupole excitations, the second distribution in Eq. (8) has an nonzero mean. But after the first several iterations, the nonzero mean value should be able to be filtered out. A more detailed discussion on a single outlier of quadrupole error will be addressed in Sect. IV. Now we take a look at the first term on the righthand side of Eq. (9). By expanding β with respect to quadrupole errors ∆K at β 0 and keeping the linear components, it reads as where ∆β(s i ) =β 0 (s i ) − β 0 (s i ). After differentiating every term with respect to ∆K and expressing it in the format of a matrix, we obtain the well-known Eq. (5). The solution to Eq. (5) is given as M T is often known as the pseudo inverse of M , because M is usually non-invertible.
Thus far, the measurement noise σ β has been ignored. The solution to Eq. (11) often overfits quadrupole errors from either noisy BPM data, or even bad BPMs if they are present. The overfitting can be mitigated by taking the second term into account, which is known as regularization technique. By adding an additional penalty term to the sum of squares in Eq. (9), one can prevent the fitted quadrupole errors from deviating from a reasonable normal distribution. In other words, a complete linear optics model provides not only a likelihood function but an informative prior probability distribution of quadrupole errors as well. The solution to the leastsquares problem with regularization is It is important to note that the optimal regularization is well-defined here. More specifically, the variance σ k (∆β) of the quadrupole error distribution p(∆K) should be determined by the measured β-beat level using the designed lattice model. Fig. 1 illustrates that the horizontal β-beat is linearly proportional to the variance of quadrupole error distribution at the NSLS-II ring. After averaging repetitive β x measurements and comparing against the nominal β x,0 , the variance of quadrupole error probability distribution σ k (∆β) can be determined with Fig. 1. During an iterative correction, β-beat reduces gradually, as do the corresponding quadrupole errors. Therefore the regularization coefficient should be dynamically adjusted to speed up the convergence as well. The p(∆K) is named as the prior probability because it can be estimated analytically [1] or numerically in advance. In the previous section, one can still use the regularization technique to avoid overfitting, but the coefficient is not necessarily optimal due to lack of a theoretical basis. Experimentally one can obtain this regularization coefficient based on correction performance on a trial basis [7]. However, the Bayesian approach can explicitly give its statistic and physics interpretation.
FIG. 1. Statistic illustration of the horizontal β-beat due to the random quadrupole errors. This linear correlation with gradually increasing variance are calculated with the NSLS-II ring lattice model in advance. Once an averaged β-beat is measured, its corresponding variance of quadrupole error distribution can be used as the prior probability to prevent overfitting Now consider the case of having multiple correction objectives. With precisely aligned TbT data, the betatron oscillation envelope function β and its phase φ can be obtained simultaneously. The strategy for balancing the correction among multiple objectives is straightforward in the Bayesian approach. The probability of having an error ∆K with given measured β and φ is a product of two conditional probability distributions Maximizing the probability yields After minimizing and linearizing the first two sums of squares in Eq. (14), two response matrices with different weighted blocks can be vertically stacked as where the weight for β-block has been normalized to 1.
The significance of the weight coefficient is to allow for correcting β and φ coherently. In other words, no objectives are over-emphasized by sacrificing others. The objective, which can be measured more precisely (with a smaller variance), plays more important role in the process of correction, automatically, as it should. The overfitting of Eq. (15) can be mitigated with the same regularization technique as Eq. (12), in which M needs to be replaced by the stacked matrix M. The product of two conditional probability distributions can be extended to cover multiple distributions, for example, β, φ and dispersion η on two separate planes. Strictly speaking, in Eq. (13), the probability distributions, p(β, φ|∆K) can be expressed as the products of p(β|∆K) and p(φ|∆K) only when they are completely independent. In reality they are correlated by Numerically we can use a lattice model to compute the correlation distribution of ∆β and ∆φ for given quadrupole error distributions. Both their mean and variance have a linear dependence on quadrupole errors as shown in Fig. 2 for the NSLS-II ring. If measured data are within this range they can be treated approximately as two independent probability distributions. If not, the measurement data is not reliable, and should be discarded. Both β and φ-functions at different locations have different sensitivities to the quadrupole errors [1]. With a designed lattice model, one can compute another prior probability to specify different weights for each term in Eq. (9) and (14) to further optimize the algorithm. This process might be necessary for collider rings because the variation of β is extremely large around interaction points, i.e., the final focus sections.

IV. RESOLVING DEGENERACY
In the previous section, we discussed the case in which the lattice distortion is due to normally distributed quadrupole errors. Once a real error is localized in a particular quadrupole, it may require us to identify which quadrupole is the root cause. This is nontrivial because quadrupoles are closely packed in modern storage rings, the NSLS-II being no exception. Therefore, their lattice response vectors (corresponding columns in M ) are often highly degenerated, especially between neighboring FIG. 2. The correlation of βx and φx distortion distribution for the NSLS-II ring. This correlated distribution is calculated with the lattice model and could be used as another prior probability to eliminate incoherent measurement data. The farther away from the line, the less reliable the measured data is. Once a data point is not within the range of the error bars, it should be removed from the data pool.
quadrupoles. The degeneracy between the i th and j th quadrupole is defined by the correlation coefficient [7] where m i is the i th column of M , which has N elements. If |C i,j | approaches 1, it is difficult to distinguish which one is the actual error source with a full Jacobian matrix. It was found that rather than using all BPMs, and instead selecting a few specific BPMs among them, the highly degenerated quadrupoles were distinguishable. Consider that there are N BPMs. The β-beats seen by these BPMs are N -dimension vectors. Among them, n (n N ) components can be selected to form two much shorter sub-vectors v i,j in a such way that v i,j have much less correlation between them. This means they should be as near-orthogonal as possible in an n dimensional vector space. There are N !/(n!(N − n)!) different permutations to select from. We found that it is not difficult to distinguish 5-6 BPMs out of 180 BPMs in the NSLS-II ring even if the correlation between some neighboring quadrupoles is above 0.98. Experimentally, we repetitively measure the lattice functions. Then we compute the correlation coefficients between v i,j and the measured lattice distortion patterns, u, as seen only at those 5-6 specific BPMs. If the quadrupole error was due to the i th quadrupole, the correlation coefficients C vi,u = vi·u vi u should be distributed close to ±1. Another one, C vj ,u = vj ·u vj u should be around zero, and vice versa. To verify this technique, an experiment and a simulation study were carried out on the NSLS-II ring. The excitation current of one quadrupole QL1G2C01A (with an index of 10) was changed by 1 Ampere. The bunch-bybunch feedback system [9] was then used to resonantly drive the beam to perform betatron oscillation at a nearly constant amplitude. Beam TbT data was acquired for 800 × 1024 turns. For every 1024 turns of data, a set of β x functions was extracted at 180 BPMs. After averaging them, an error distribution was fitted out with the Bayesian approach. It was found that the maximum error is not QL1G2C01A as it should be, but its neighbor QL2G2C01A (with an index 11) (see Fig. 3). It is not surprising because the correlation between the 10 th and 11 th columns of the Jacobian M is as high as 0.9896.

FIG. 3.
The 10 th quadrupole QL1G2C01A excitation is changed by 1 Ampere intentionally at the NSLS-II ring. By observing βx distortion at all 180 BPMs, the error source is incorrectly identified as its neighbor, the 11 th quadrupole QL2G2C01A. The reason for that is the correlation between the two magnets, as seen by all 180 BPMs, is as high as 0.9896.
Among 180 BPMs, we specifically selected 6 of them with their indices as {30, 31, 37, 62, 71, 78}. Observed at these BPMs, the unit β x functions response to these two quadrupoles is near-orthogonal with a correlation coefficient as low as 0.0612 (see Fig. 4). 800 independently measured β x -beat patterns at those specific 6 BPMs were compared against these two unit response vectors v 10,11 . The histograms of their correlation coefficients are illustrated in Fig. 5. It becomes clear that the β x distortion is likely due to the quadrupole QL1G2C01A rather than its neighbor QL2G2C01A, because the measured optics distortion pattern is highly correlated with its response vector. A simulation was also performed to reproduce the experiment observation as illustrated in Fig. 6.

V. SUMMARY
The Bayesian approach explicitly emphasizes the coherence in the existing methods for linear lattice correction. By representing the lattice models as several likelihood functions and some prior probability distributions, overfitting of the optics corrections can be prevented. Prior probabilities can be calculated based on the lattice model prior to lattice correction. At the same The unit βx responses vectors of quadrupole 10 and 11 seen at 6 selected BPMs. They are near-orthogonal in the 6-D vector space because their correlation calculated with Eq. (17) is as low as 0.0612.
FIG. 5. The probability density distribution (PDD) of the correlation coefficients between 800 measured βx distortion and two unit vectors v10,11. The independently and repeatedly measured β-beats are highly correlated with quadrupole 10's pattern, rather than its neighbor. Based on that we can conclude that the actual error source is more likely from the quadrupole 10 (QL1G2C01A) instead of quadrupole 11 (QL2G2C01A). time, the Bayesian approach gives the weights for different fitting objectives based on their measurement errors so that no objectives are over-emphasized by sacrificing others. If a distorted β pattern comes from a single error source that is highly degenerated with its neighbors, it can be difficult to address. A new technique for resolving degeneracy and identifying the real error source has been demonstrated with both simulation and experimental observation.
The Bayesian approach is general and can be incorporated into other lattice and orbit correction algorithms or online optimizations [10], such as the linear optics from closed orbits (LOCO) algorithm [11]. The distortions of The probability density distribution (PDD) of the correlation coefficients between 25,000 simulated βx-beats patterns and two unit vectors v10,11. The reproducibility further confirms our experiment observation. orbit response matrix (ORM) elements, due to each individual quadrupole error, can be calculated in advance and used to explicitly define the regularization coefficient to control overfitting [7]. Using a few specific ORM elements to compose orthogonal vectors should be able to address quadrupole degeneracy as well. One advantage of LOCO is that it unifies the β and φ as the directly measurable parameters with their intrinsic correlation. Therefore, the fitting needs to be driven by a complete lattice model, and might be time-consuming. In some scenarios, the prolonged time the LOCO method would require may be considered worth it. For example, when TbT data is polluted by the beam decoherence due to a large positive chromaticity and/or nonlinearity [12].