Physics-Based Deep Neural Networks for Beam Dynamics in Charged Particle Accelerators

This paper presents a novel approach for constructing neural networks which model charged particle beam dynamics. In our approach, the Taylor maps arising in the representation of dynamics are mapped onto the weights of a polynomial neural network. The resulting network approximates the dynamical system with perfect accuracy prior to training and provides a possibility to tune the network weights on additional experimental data. We propose a symplectic regularization approach for such polynomial neural networks that always restricts the trained model to Hamiltonian systems and significantly improves the training procedure. The proposed networks can be used for beam dynamics simulations or for fine-tuning of beam optics models with experimental data. The structure of the network allows for the modeling of large accelerators with a large number of magnets. We demonstrate our approach on the examples of the existing PETRA III and the planned PETRA IV storage rings at DESY.


I. INTRODUCTION
Machine Learning (ML) techniques are finding increasing usage in various aspects of particle accelerators, including fault prediction, performance optimization, and virtual diagnostics. For more information see e.g. [1]. Many applications could benefit from having a beam optics model which on the one hand accurately represents the beam dynamics of the accelerator, and on the other hand, can be trained or adjusted on the limited amount of experimental data to serve as a model of the real machine with various imperfections. Machine Learning methods and neural networks (NN) in particular hold a promise for constructing such models.
However, applying ML methods for learning the behavior of dynamical systems such as those describing linear and nonlinear dynamics in charged particle accelerators can in some cases require a prohibitively large amount of training data. There is a need for ML-based modeling approaches that reduce reliance on large data sets. A major concern is that models trained with limited observations may not generalize well beyond the training dataset and will thus exhibit poor predictive power. A significant improvement potential in the generalization ability and the training performance of ML models such as NNs lies in incorporating physics constraints either in the NN architecture or in the training process.
NNs are often employed for system learning and control [2,3], when models are trained either with large measured or simulated datasets. By physics-inspired neural networks [4] authors generally mean either incorporating domain knowledge in the traditional NN or providing additional loss functions to ensure physically consistent predictions [5][6][7]. Most of the authors either incorporate numerical integrators for ordinary differential equations (ODEs) into NN or, oppositely, parametrize equations with NNs. There also exist various studies related to PNNs in the literature [8][9][10]. So, in [8,9] polynomial architectures that approximate differential equations are discussed. All studies related to PNNs regard the polynomial architectures as black-box models, and they do not indicate the architectures' connection to the ODEs.
We propose to incorporate the physics constraints on the single-particle beam dynamics in accelerators by introducing NN architecture which is directly derived from the Taylor maps corresponding to the accelerator components and additionally imposing symplectic constraints on the network. The approach presented in the paper is based on works [11][12][13], where the authors target building polynomial neural networks (PNN) that approximate the exact system of ODEs. In contrast to these works, we do not target solving exact equations and simplified physical problems, but focus on the practical application of the fine-tuning of the PNN with measured data.
In this paper, we first introduce the polynomial neural network (PNN) architecture incorporating Taylor map (TM-PNN) information. Training and symplectic regularization are then discussed, followed by the application of TM-PNNs to simulating beam dynamics in storage rings. Finally, the results of fine-tuning the TM-PNN based on experimental data at the PETRA III [14] storage ring and simulations for the proposed PETRA IV [15] ring at DESY are presented.

II. POLYNOMIAL NEURAL NETWORK INCORPORATING TAYLOR MAPS
Taylor maps are a natural and well-known approach of treating beam dynamics in particle accelerators, see e.g. [16,17]. A Taylor map M is a polynomial transformation of the phase space coordinates of a particle in the form Here X 0,1 ∈ R n are the n-dimensional phase space coordinates of a particle, W i are weight matrices, and X [k] are the k-th Kronecker powers of vector X. For example, for X = (x 1 , x 2 ), X [2] and so on. The weights of (1) can be derived from equations of particle motion that can be described in a general form with polynomial coefficients where s is an independent variable, for which in accelerator physics the path length of the reference particle is usually taken. Differentiating (1) and combining it with (2) yields an equation for weight matrices W i where f i are functions of matrices W i and P i . For instance, 0 . Solving (3) for W i gives the dynamics of the system for all initial conditions by means of (1). The success of this technique for modeling beam dynamics in accelerators lies in the fact that Taylor maps for individual magnets can be readily constructed to desired order, and long sequences of magnets comprising an accelerator can be represented by concatenating the maps corresponding to individual magnets Various methods for efficiently concatenating such maps and extracting physics parameters such as betatron tunes or amplitude detuning have been developed [16]. An efficient way of constructing a neural network which could model dynamics in such a system is to represent each map in this chain by a set of polynomial neurons (layer) computing exactly the same Taylor map, and then link these layers into a deep network. The NN has as its input the phase space coordinates and propagating the inputs through each layer is fully equivalent to applying the corresponding map. The general architecture of the TM-PNN is shown in Fig. 2. The network has a single input and can have an arbitrary number of outputs corresponding to observables. So, to have a beam trajectory at the beam position monitor (BPM) locations as output, the drift space containing the monitor is split into two parts at the monitor location. Two layers for drifts upstream and downstream of the monitor are created, and the upstream layer additionally has beam position coordinates x and y as outputs. Generally, the network output can be an arbitrary function of the outputs of all neurons in the network. Moreover, any number of parameters of the accelerator lattice such as the strength of quadrupole magnets can be trivially introduced. To add one additional parameter, the network input vector dimension is extended by one and the parameter becomes an additional input to the neurons where it enters.
For example, the transfer matrix for horizontal motion through a focusing quadrupole of length L can be parametrized by Taylor map and can be represented as a polynomial neuron with three-dimensional input X 0 = (x 0 , x 0 , k) and two-dimensional output X = (x, x ) where the weights up to the second order are We further illustrate the concept by an example of a network that corresponds to a simple but practically relevant magnet arrangement, the FODO structure with sextupoles, which exhibits nonlinear behavior typical of circular accelerators. The magnetic lattice and the corresponding NN are shown in Fig. 1. This Taylor map-based polynomial neural network (TM-PNN) has 12 layers, for each component in the lattice. The resulting NN fully represents the dynamics of the lattice, including parametric dependency on quadrupole gradients. The system is nonlinear and exhibits resonant behavior. Fig. 3 shows the phase space portraits of this lattice computed by TM-PNN.

III. NN TRAINING AND SYMPLECTIC REGULARIZATION
A TM-PNN pre-initialized with the Taylor map coefficients extracted from the ideal magnetic lattice already provides an accurate model of the ideal accelerator and does not require training. Training may be required if it is desired to fine-tune the model based on experimental data. The training is done in the usual way by fine-tuning the network weights given a set of inputs and outputs. For training, we use the Adam [18] algorithm with gradient clipping. The examples of such training are given in the last section of this paper.
The initial network weights are derived from the maps and are guaranteed to be physically meaningful. During the network training process, we need to ensure the physical consistency of the weights, so that they are changed in such a way that the layers still correspond to Taylor maps of magnets. The Taylor maps for the systems of ordinary differential equations contain weights that vary by orders of magnitude, and the often-used L1-or L2-norm-based regularizations of the weights during the training which try to reduce the absolute magnitude of the weights cannot be applied.
It is however possible to introduce physical invariants of the underlying dynamical systems as regularization penalties on the weights of the TM-PNN. For Hamiltonian systems representing single-particle beam dynamics, this can be the symplecticity [19], which means that for a map M : where I is an identity matrix, and T is the matrix transpose. The symplectic property (6) for TM-PNN leads to algebraic constraints on weights W i = {w jk i }. The symplectic penalty can be implemented for an arbitrary Taylor map as an additional term for the loss function: where M E is a metric that controls accuracy on training data, and S is a symplectic reglarization penalty. For example, for the second-order Taylor map for twodimensional state vector with where i, m are row and j, n column indices in condition (6), the symplectic constraint becomes The penalty S is the sum of squares of all left-handside terms in (7). Since this penalty does not depend on the inputs, the Hamiltonian structure is preserved for all new inputs which has a large impact on generalization. If the symplectic regularization is not considered during the training, the tuning of the maps leads to overfitting of the model that causes inaccurate predictions.

IV. APPLICATION TO BEAM DYNAMICS SIMULATIONS
TM-PNNs can be used for fast simulations of beam dynamics [20]. We implemented translation of beam optics models to TM-PNN using maps of up to second order for individual elements computed in OCELOT [21]. From that point on, the computations necessary for particle tracking are completely performed by standard and highly optimized NN software such as TensorFlow [22]. This makes it possible to have highly efficient particle tracking on various parallel architecture without any additional effort. The computations were cross-checked on the PETRA III and PETRA IV lattices. Fig. 4 presents the scaling of computational speed for four-dimensional particle tracking for the PETRA III lattice. On multicore architectures, tracking of thousands of particles can be performed at the same speed as tracking of a single particle. Note that the network can deal with large lattices. So, the PETRA III lattice has 1519 elements, each of which was represented by a NN layer.

V. DATA-DRIVEN TUNING OF THE TM-PNN
Due to alignment errors, field imperfections, and other factors, ideal beam optics models of accelerators differ to various extents from the parameters of real machines. Adjustment of optics parameters in operation is normally performed. Usual methods of optics measurement and correction [23] exploit linear or weakly nonlinear dependency of observable parameters such as beam orbits on free model parameters such as magnet strengths and alignments. The problem of optics measurement is formulated as a least-squares minimization of free parameters given a set of observations, and then a correction is done using some of these parameters as controls to minimize the discrepancy between theoretical and observed models.
One of the general shortcomings of these methods is that the number of free parameters such as offsets and strength errors of magnets is larger than the number of observables, which is ultimately limited by the number of free parameters in the phase space transformation between adjacent beam position monitors. For example, for linear coupled motion the number of such parameters is the dimension of the group of 4D symplectic matrices which is n(2n+1) = 10 so that the number of possible observable parameters in the beam optics is 10N BP M . So, for the PETRA III ring at DESY the number of magnets is N mag = 1519 and N BP M = 246. With 6 free parameters for geometrical misplacement of the components, optics determination becomes strictly speaking ill-posed. Employed least-square fit usually has a different number of parameters than intrinsically present in the model, and additional assumptions or empirical knowledge is often required to successfully use such fits. In contrast, a TM-PNN with a minimal set of layers (i.e. one layer for the transfer map between two BPMs) does not have the shortcomings mentioned above.
A TM-PNN can serve as the backbone for both optics measurement and for beam control. The number of layers should be the smallest possible but include all observables. For example, maps for all magnets between BPMs can be concatenated. However, elements used for beam control should be preserved as separate layers.
The first step, model inference, is fitting all the weights in the network for a given training set of observations, such as orbit measurements. The second step, control, consists of giving the network a target function, such as the target orbit, as a new training dataset, and then using only the weights in the layers corresponding to control elements to fit the desired target. Since by construction of the network there is a unique correspondence between the weights and control parameters, the appropriate correction can be directly inferred. The TensorFlow engine can be used for both training of the network with available observations, and for finding the control parameters. This concept is demonstrated below on several examples.  5. The PETRA IV cell (a seven-bend achromat of ESRF-EBS type) that consists of bending magnets, shown in blue, focusing and defocusing quadrupoles, shown in red, and focusing (sf) and defocusing (sd) sextupoles, shown in green.

A. Simulated data for PETRAIV
First, we used a simplified version of the Multi-Bend Achromat lattice of PETRA IV (see Fig. 5) [15] to model single-pass trajectory correction with a TM-PNN. Single-pass trajectory correction is an important step in machine startup when beam accumulation is not yet achieved. Random magnet misalignments were generated, and single-pass beam trajectories simulated. In the following we give an example of trajectory correction on the example of a single cell. The cell has 36 magnets and a total of 166 logical elements (free spaces, markers, etc.). For simplicity, we consider only beam motion in horizontal plane x − x . The cell has 11 beam position monitors (BPMs) and 10 orbit correctors.
The orbit correction is formulated as a nonlinear optimization problem where x i is a beam coordinate measured at i-th monitor, c j is a strength of j-th corrector magnet. The simulated trajectory including misalignments is fed into the TM-PNN representing the ideal lattice to re-calibrate the model.The network was then used to find corrector strength resulting in zero trajectory. The correction results for one achromat cell are shown in Fig. 6. Only one trajectory is required to train the network which can then perform the orbit correction.

B. Experiments at PETRAIII
Experiments were performed at PETRA III to demonstrate applicability of TM-PNN for single-pass trajectory correction in real operation. As mentioned before, singlepass trajectory correction is an important task in commissioning and startup of accelerators where the optics model can also be subject to uncertainties. Moreover, diagnostics resolution in single-pass mode is worse than for the stored beam and the demonstration of model recalibration thus becomes more challenging. For the measurement the beam position monitors were set up to read single-turn data and were triggered at beam injection into the ring. The first orbit corrector in the storage ring upstream from the injection point was set up such that the beam was making only one turn. The TM-PNN model was initialized with the nominal PETRA III lattice, and the one-turn trajectories consisting of (x, y) positions at 246 BPM locations were used for training.

Beam threading
The first experiment demonstrated that a TM-PNN can be used for beam threading through the machine and for orbit correction. We switched off all corrector magnets which leads to the beam being able to travel through only a part of the ring. To propagate the beam further we perform orbit correction with the available (upstream from the beam loss location) correctors by the TM-PNN. For example, only 30 correctors are available before beam is lost in the first iteration, with the total amount of 210 horizontal and 194 vertical correctors in the ring. Given the measured data at each step, the TM-PNN predicts values of corrector magnets that will reduce displacement of the measured orbit. These corrector values are applied to the machine and a new measurement is acquired. After several steps, the beam is propagated along the ring. Fig. 7 shows several iterations of beam threading and trajectory correction with a TM-PNN. Solid lines represent beam coordinates, dashed lines are noise downstream from the beam loss. Note that in this example model re-calibration is not essential due to relatively weak nonlinearities in the machine and the primary goal was the demonstration of orbit control with the help of the TM-PNN.

Optics imperfection
In the second experiment we used the same beam set-up to demonstrated model re-calibration on partial, noisy, and limited observations. Optics imperfections were introduced in a controlled way by detuning the strength of one quadrupole magnet by 20%. The network was trained on the single-pass trajectories from this set-up and then used in multi-turn tracking. Since the imperfection is known, we independently reproduced that optics in a traditional tracking code (OCELOT). Multiturn particle tracking was done in both codes and the tunes extracted by Fourier analysis (see Fig. 8). Note that only a single one-turn trajectory was used for model re-calibration. Tune measurement based on such data with traditional methods is generally impossible. We introduced a new polynomial neural network architecture that is ideally suitable for representing beam dynamics in charged particle accelerators. The networks are already pre-initialized to represent linear and nonlinear beam dynamics and can be effectively trained using the symplectic regularization technique we proposed. They provide both a backbone for fast parallel computations and a universal ML-based approach for experimental measurement and correction of charged particle beam optics. Symplectic regularization is limited to hamiltonian systems, but the TM-PNN approach is a more general. In [12] it is shown that an arbitrary system of nonlinear ODEs can be translated to Taylor maps. So, radiation and/or space charge effects can be introduced into TM-PNN by, for example, mapping onto them the systems of equations from [24]. Also, further research on the estimation of accuracy, performance, and limitations of the proposed method as well as work on introducing it to the daily accelerator operation should be performed.