Advances on the modelling of the time evolution of dynamic aperture of hadron circular accelerators

Determining a model for the time scaling of the dynamic aperture of a circular accelerator is a topic of strong interest and intense research efforts in accelerator physics. The motivation arises in the possibility of finding a method to reliably extrapolate the results of numerical simulations well beyond what is currently possible in terms of CPU time. In earlier work, a proposal for a model based on Nekhoroshev theorem and Kolmogorov--Arnold--Moser theory was made. This model has been studied in detail and proved successful in describing the evolution of the dynamic aperture in numerical simulations, however, a number of shortcomings had been identified and new models are proposed in this paper, which solve the observed issues. The new models have been benchmarked against numerical simulations for a simple system, the 4D H\'enon map, as well as a realistic, non-linear representation of the beam dynamics in the LHC at 6.5 TeV providing in both cases excellent results.

DA involves a number of challenging aspects, notably to understand which mechanism are determining its features as well as to address the numerous computational issues. In this paper we focus on a very specific aspect, namely the possibility of modelling the scaling law of DA as a function of the number of turns. This problem has been considered since the end of the 90s [17,18], as finding how to describe and efficiently predict the value of the DA would solve some fundamental problems in accelerator physics and performance optimisation of circular accelerators. A reliable model for the time evolution of the DA would allow the severe limitations in terms of CPU-time to be overcome. In fact, to perform numerical simulations required to predict the performance of a circular accelerator over a realistic time interval is beyond the reach of current computers. For the LHC case, simulations up to 10 6 turns are at the limit of the CPU-time capabilities, but this represents only ≈ 89 s of storage time, whereas a typical fill time is of the order of several hours. Ultimately, such a model would also open the possibility to study observables that are more directly linked with machine's performance, such as beam losses and lifetime [19].
To obtain a satisfactory solution to this problem, the attempt made in earlier work addressed the possibility of building models for the DA scaling with time based on fundamental results of dynamical system theory, such as Kolmogorov-Arnold-Moser (KAM) [20,21,22,23] theory and Nekhoroshev [24,25,26,27] theorem. In fact, to ensure applicability across machines and for different physical conditions, we need a scaling law based on the solid ground of fundamental theorems. Although very successful, this approach revealed two issues: the possibility of obtaining nonphysical model's parameters [18,19] and the presence of internal dependencies among them [28,29]. The first generates a contradiction with the key assumptions of the scaling law, as, according to KAM and Nekhoroshev theorems, the parameters should always be positive. The latter affects the numerical stability of the model parameters whenever their dependence on physical quantities, such as linear coupling, is investigated, practically preventing this type of investigations.
An in-depth review has been carried out, the outcome of which is that it is indeed possible to overcome the two limitations observed. The first is solved by proposing a scaling law based on Nekhoroshev theorem, only. This is justified by the fact that the condition for the applicability of the stability-time estimate provided by Nekhoroshev theorem is more general than the existence conditions of KAM tori. Moreover, the phenomenon of Arnold diffusion that occurs in generic Hamiltonian systems with more than two degrees of freedom is extremely slow and affects a set of initial condition having a very small measure. The second one is solved by using the parameters' dependencies obtained from a closer inspection of the form of the estimate of the stability time of Nekhoroshev theorem. All these important advances will be presented and discussed in detail in this paper.
The applicability of the Nekhoroshev theorem to circular accelerators requires two assumptions, namely that the system under consideration is time independent and that it is quasi-integrable with analytic dependence on the phase space variables. The first assumption can be always satisfied by extending the phase space of time-dependent systems. The latter assumption is not satisfied closed to the DA. Nonetheless, the functional form of the stability-time estimate provided by the Nekhoroshev theorem is a very robust result. In fact, it is based on the optimal estimate of the remainders of the perturbative series rather than on their convergence properties. Therefore, it is applicable even when a large fraction of the KAM tori are destroyed and replaced by a weakly-chaotic region.
The plan of the paper is as follows: in Section 2 the estimate of the stability time derived from the Nekhoroshev theorem is briefly reviewed and the connection with the Lambert-W function is discussed. The models proposed for describing the time variation of the dynamic aperture are presented and reviewed in detail in Section 3, while their behaviour is analysed in Section 4. Here, applications to the results of numerical simulations for the 4D Hénon map and a detailed model of the beam dynamics in the LHC at 6.5 TeV are presented. In Section 5 the predictive power of the proposed DA models is discussed in some detail, while an interesting observation of the properties of the DA models is discussed in Section 6. Finally, the conclusions are drawn in Section 7 and some detailed considerations on the Lambert function can be found in the Appendices.

Nekhoroshev theorem and the Lambert-W function
The Nekhoroshev theorem provides an estimate for the number of turns N (r) for which the orbit of an initial condition of amplitude r remains bounded [25,26,27], namely where r * and κ are positive quantities each capturing some key features of the system under consideration. In [26] the estimate of the stability time has been obtained by firstly re-scaling all co-ordinates by appropriate factors so that they do not have a physical dimension. r * is a dimensionless constant whose value represents an apparent convergence radius of the asymptotic perturbative series arising in the Normal Form problem of a symplectic map near an elliptic fixed point. According to this interpretation of the physical meaning of r * , its value should decrease when increasing the strength of the non-linearities present in the system.
In the original version of Nekhoroshev theorem [25], κ is a function of the number d of degrees of freedom of the system under consideration. However, there is no guarantee that the expression found for κ represents an optimal estimate. In the case of a symplectic map near an elliptic fixed point [26,27] the simpler expression κ ≈ (d + 1)/2 was given in a generic framework, but once again without guarantee of being the optimal estimate when one considers a specific case.
Equation (1) is valid for the region in r constrained by For the sake of generalisation, we recast Eq. (1) into where λ ≥ 0. We can recover the original formulation (Eq. (2.16) in [26]) by making the identifications: from which it is clear that r * , ρ * , κ and N 0 are not independent parameters. In particular, it is worth stressing that N 0 is not and independent parameter, but if a function of r * , λ. For this reason, it will not be used as a fit parameter in the models presented in the following sections. Equation (4c) is consistent with the theoretical result only for the case λ = 1 2 . For other values of λ, the scaling r λ * still holds but the factor 7 √ 6 48 might be normalised differently. When fitting models where λ = 1 2 , it is therefore advised to either use N 0 as a fitting parameter, or to ignore the normalisation factor and only use the scaling, i.e.

Inverting the Nekhoroshev stability-time estimate
To be able to invert Eq. (3), we need to solve equations of the form First we make the substitution τ = ζ2 ζ1 w ζ2 : and next we take the root ζ 1 /ζ 2 ζ 2 ζ 1 z ζ 2 which we can now trivially solve: where W is the so-called Lambert-W function, a multi-valued function whose properties are briefly reviewed in Appendix A. Comparing (6) with (3) and identifying ζ 1 = λ and ζ 2 = − 1 κ , we finally get: The choice between the two real branches W 0 and W −1 (see Appendix A) will be determined by the requirement that W remains real and it will hence depend on the sign of the parameters as shown in Appendix B. The summary is reported in the following table, where conditions on the parameters and the validity region are reported.

Application of the Lambert function and of its series expansion
Equation (10) represents the formula linking the size of the stability region to the stability time. Therefore, the closedform model for the scaling law of dynamic aperture is represented by While Eq. (11) is the exact solution of the dynamic aperture scaling law according to the Nekhoroshev estimate, it might not be so useful in practice. To this aim, one might use the series expansion of W −1 as given in [30], namely where c lm = (−1) l m! l + m l + 1 (13) and the symbol in square brackets represents a Stirling cycle number [32,33].
Note that the inverse-logarithm law [17,18] (see also Section 3.1) r = r * ln κ N N0 (14) can be recovered by taking the limit The expansion (12) can be used to prove the limit (15), as it can be recast in the following form We note that the series expansion depends on two terms of the same form that can be transformed into ln (|Λ| exp{Λ ln z}) = ln |Λ| + Λ ln z (17) ln (− ln (|Λ| exp{Λ ln z})) = ln (− ln |Λ| + |Λ| ln z) .
These terms are divided by Λ, which overcompensates the logarithmic divergence of the various terms except for that of the form Λ ln z. Hence, the result in Eq. (15) is easily proven.
3 Models of dynamic aperture time evolution

Original model and its improvement
Based on the outcome of detailed numerical simulations for several accelerator models, in [18] a description of the time evolution of the dynamic aperture was proposed in the form of where D ∞ represents the asymptotic value of D(N ) and can be justified in the framework of KAM theory, while the N -dependent term is derived from Eq. (3) for λ = 0, N 0 = 1 and the fit parameters are D ∞ , b, κ. Note that b = r * from Eq. (1). In Ref. [18] it was mentioned that in some conditions the fit parameters might become negative. This implies that, strictly speaking, for those cases the scaling law cannot be justified in terms of Nekhoroshev theorem, which is not satisfactory as we would like to propose a general scaling law for dynamic aperture supported by fundamental theorems of dynamical systems theory. Moreover, in several subsequent studies a dependence between the fit parameters was observed [28,29]. For these two reasons, an alternative form of the fit has been considered. To overcome the first limitation, in the new model the term D ∞ is dropped, so that the scaling law is based only on the stability-time estimate provided by the Nekhoroshev theorem. To address the second point, the inter-dependence between the parameters (4a)-(4c) has been taken into account.
where the free parameters are ρ * , κ, N 0 . By comparing Eqs. (19) and (20) which is exactly the relation from Eq. (4b) as r * = b and it represents a first hint to explain the observed dependence between the fit parameters in Model 1.
Parenthetically, the model (20) can be written also in the following form ln D(N ) = ln ρ * + κ ln κ − ln(2 e) − ln ln which can be more convenient for a numerical application. In this case the natural choice for the fit parameters is ln ρ * , κ, N 0 .

New models based on the Lambert function
By considering the expressions for the parameters as given in Eqs. (4a)-(4c) it is possible to recast the scaling law for the dynamic aperture (11) in the following form where the free parameters are ρ * , κ and, possibly, λ, unless it is fixed to the value of 1/2 according to the analytic Nekhoroshev estimate. In the rest of the paper Eq. (23) will be indicated as Model 4. The notable limit discussed in the previous section implies that Model 4 reduces to Model 2 when λ → 0 + The series expansion of W −1 can be used to obtain an approximate model for the time evolution of the dynamic aperture. In fact, one can retain the lowest order terms and reduce Model 4 to the form with where the relations (4a)-(4c) have been used and, also in this case, the model parameters are ρ * and κ and possibly λ.
It is straightforward to verify that Model 3 tends to Model 2 for λ → 0. While for Model 3 N 0 is a function of the other fit parameters, it is a constant for Model 2. It is worth noting that Model 3 features the same logarithmic behaviour as in Model 2, but it also includes N -independent terms and double-logarithmic ones, which represent an improvement in spite of the same number of parameters (if λ is set to 1/2). In this respect, Model 3 can be seen as an intermediate one, between Model 2 and Model 4. Therefore, we expect a better performance than Model 2 and a simpler numerical implementation than Model 4 that requires the special function W. As a last remark, Model 3 can also be written in logarithmic form, which might be convenient from a numerical point of view.

Analysis of the models of dynamic aperture evolution
In the rest of the paper, the new models of the DA evolution with time will be scrutinised by analysing their behaviour when applied to a simple dynamical system like the 4D Hénon map and to realistic realisations of the LHC ring at top energy. The most stringent conditions are used, which means that the number of model's parameters has been reduced to the minimum, i.e. to 2 as λ is fixed to 1/2 and N 0 is set to 1 for Model 2 or its functional relationship on ρ * and κ is applied.

Generalities
The 4D Hénon map [34] is a well-known model that combines simplicity in its form with a rich dynamical behaviour. Moreover, it represents the betatronic motion of a FODO lattice with a single sextupole in the single-kick representation. Such a system can be made more complicated by introducing a time modulation of the linear frequencies. The reason to consider the modulated version of the 4D Hénon map in this context is twofold: on one hand the tune modulation takes into account the coupling with the longitudinal dynamics, on the other hand in [18] it was observed for the first time that for large values of the modulation amplitude , the proposed model for the DA scaling was providing negative values of the fit parameters.
The modulated 4D Hénon map reads where (x, p x , y, p y ) are the phase-space coordinates after transformation to linear normalised co-ordinates, i.e. Courant-Snyder co-ordinates, and after rescaling by the strength of the sextupole to make the map independent from the sextupole strength [34]. L is a matrix given by the direct product of two 2D rotations R, namely where the linear frequencies ω y are varying with the discrete time n according to As for the values for the parameters, we have considered ω x0 = 0.168 and ω y0 = 0.201 and for the Ω k frequencies and k amplitudes the values are listed in Table 2 using the same values as in [18]. k Ω k 10 4 A plot of the map stability is reported in Fig. 1 for three values of . The tracking has been performed up to 10 7 turns for several values of the modulation amplitude in the interval 0 ≤ ≤ 64. A polar grid of initial conditions have been built, with the angular interval [0, π/2] divided into 100 parts and a step in amplitude of 10 −2 . The DA of the map is computed by using the amplitude of the last stable initial condition for each angle and by taking the angular average, as discussed in [35].

Results of numerical investigations
Detailed numerical simulations have been performed, aimed at computing the DA as a function of using the approach reported in [35] for the the computation of the DA and the associated numerical error. A summary plot is shown in  All models provide a good agreement with the numerical data. The essential differences between them can be better appreciated by inspecting their dependence on , which is shown in Fig. 3. In the upper row, the three parameters for Model 1 are plotted, while in the second row the two parameters of Model 2, 3, and 4 are shown, together with the R 2 adj , the adjusted coefficient of determination, for all four models. For Model 1, it is clearly visible that κ and b are varying in sign, with rather larger changes of their absolute values. Such large variations are also visible in the behaviour of D ∞ , which also features some outliers. On the other hand, the remaining three models feature a rather smooth dependence on and a strong similarity between them. It is worth noting that Model 2 and 3 feature parameters values that resemble more to each other than those of Model 4.
These results indicate that negative parameters are only a feature of Model 1. All this indicates that the observations made in [18] about the presence of negative values of the model parameters are an artefact of the model form, in which a constant term D ∞ is added to that derived from the Nekhoroshev estimate of the stability time. In general, this form provides positive fit parameters, but in some cases negative values might appear, which stem from the compensation between the two terms of the model. Of course, this improves the fit quality as shown in the behaviour of the R 2 adj , which is in general higher for Model 1 as compared to the other models. The special care needed to analyse a model made of the sum of different terms, with respect to a model made of product of various terms, will be discussed again in Section 6. Moreover, it is important to stress that the new models depend on two free parameters, only. Hence, they really seem to outperform Model 1.

LHC dynamic aperture experiment at top energy
In this paper, we do not aim at discussing the agreement between experimental measurements and numerical simulations, rather we use realistic models, based on experimental configurations used in the LHC, to study the performance and behaviour of the proposed models of DA variation with the number of turns. DA measurements at the LHC (see Fig. 4, upper, for a layout of the LHC ring) have been carried out at injection energy [36,37,38] using different approaches, i.e. the standard kick method [36] or the novel approach [37,38].
Recently, DA measurements have been successfully performed also at at 6.5 TeV in the LHC [39]. The goals of these measurements were many-fold: the use of squeezed optics allows investigations of the impact on beam dynamics of the non-linear field errors stemming from the quadrupoles in the high-luminosity insertions. Thus, one could examine and quantify the influence on beam loss and lifetime from changes in the strength of the normal dodecapole correctors (see Fig. 4, bottom, for a sketch of the high-luminosity insertions, whose magnets were used during the experiment) in the ATLAS and CMS interaction regions (IR) 1 and 5, respectively. This aspect is particularly relevant in view of the future High Luminosity LHC project [14], for which the operational strategy to set the non-linear correctors in the high-luminosity IRs is being actively studied.
The detail regarding the experimental session and the LHC set-up can be found in [39]. Here, it is important to mention that large dodecapole sources were introduced by powering the IR-b 6 correctors left and right of the interaction point (IP) 1 and 5, uniformly to their maximum current. Finally, the IR non-linear corrections for normal and skew sextupole and normal and skew octupole errors, which had been commissioned at the 2017 start-up, were collectively removed.
The ring model used for the numerical simulations of the DA is the most accurate description of the LHC lattice, including the measured field errors (see [40] for more detail) together with the operational configuration of the various correction circuits. The numerical protocol used envisages the generation of sixty realisations of the magnetic errors to take into account their measurement uncertainties. Moreover, a polar grid of initial conditions in x − y space is defined and their evolution is computed for up to 10 6 turns. The polar grid of initial conditions is obtained by dividing the first quadrant of the x − y space in 59 angles and along each direction 30 initial conditions are uniformly distributed over intervals of 2σ. The DA has been computed using the approaches described in [35].
The evolution of the initial conditions through the LHC magnetic lattice is computed using the SixTrack code [41], which implements a second-order symplectic integration method. All configurations used in the DA experiment at 6.5 TeV have been simulated through 6D numerical simulations and they are used in the following for assessing the performance of the models describing the time variation of the DA. Note that in the rest of the paper the configuration in which all IR correctors are powered will be indicated as Configuration A, while that with the dodecapolar corrector only as Configuration B.

Results of numerical investigations
The comparison of the performance of the various DA models is largely independent on the LHC configuration used. Therefore, a selection has been applied and in the following the outcome of the numerical simulations for Configuration B for Beam 1 (i.e. the clockwise beam, whereas Beam 2 is the counter-clockwise beam) will be presented and discussed in detail. Figure 5 shows both the DA data as computed using SixTrack for the sixty realisations (using different colours) of the LHC magnetic field errors up to 10 6 turns. These data have been modelled using both the old model and the new ones and the corresponding curves are also shown (using the same colour palette as for the numerical data). Moreover, extrapolation of the considered models up to 10 8 turns have been computed and the results reported, to provide quantitative information about the predictive power of the various models.
The bars reported in the four plots are centred around the weighted average of the DA for a given number of turns, i.e. N 1 = 10 6 (corresponding the the maximum number of turns simulated with SixTrack), N 2 = 10 7 and N 3 = 10 8 (corresponding to two extrapolation times). For N 1 , a comparison between the DA obtained from numerical simulations and that derived from the various models is carried out. The DA data from SixTrack are averaged over the realisations and the rms is used as associated error. For the DA values obtained from the models, for each LHC lattice realisation a model is fitted, the errors on the corresponding model parameters are used to evaluate the error associated with the DA estimate. Finally, all sixty DA values are averaged using the corresponding errors as weights and the rms is used as associated error. The numerical values, including the relative errors obtained by taking the minimum and maximum DA value for the ensemble of sixty values are listed in Table 3. It is clear (both from Fig. 5 and Table 3) that the error bars for Model 1 are larger than those of the other models, thus confirming a better precision of the extrapolated DA values for Model 2, 3, and 4. Another essential feature is that Model 1 provides parameters that vary significantly between different realisations of the magnetic errors (called seeds), also changing their signs. This is clearly shown in Fig. 6, where the three parameters of Model 1 are shown as a function of the seed. The average value of each parameter over the seeds is also given, weighted by the corresponding error from the fitting procedure, together with an estimate of the associated error. As additional information, the relative spread around the weighted average of each model parameter is shown using the secondary vertical axis of each plot, and the wide range covered can be clearly appreciated.
For the sake of comparison, the distributions of parameters for Model 2, 3, and 4 are reported in Fig. 7. The situation is completely different, with only positive values and a rather small spread between the different cases, hence overcoming the limitations observed for the original Model 1.
For some cases, even Model 1 provides positive values of the parameters, as observed already in [18]. However, whenever Model 1 provides negative nonphysical model parameters, the new models provide positive values, without meaningful impact on the accuracy in reproducing the numerical data. Indeed, the main conclusion of our investigations is that the very form of the original model, with the constant term D ∞ added to the logarithmic one, is responsible for the appearance of negative nonphysical model parameters. Furthermore, it induces a larger variability of the model parameters, possibly indicating overfitting. All these aspects are absent in the newly proposed models for the DA dependence on time, which is an important step forward for reliable modelling of the time dependence of DA.

Probing the predictive power of the DA models
As it was presented and discussed in [18], the predicting power of the proposed DA models has been verified by varying the amount of data used to build the models and then extrapolating the DA value to a fixed number of turns to verify the agreement between the DA value obtained by means of the models and the numerical simulations. Indeed, in the previous section the extrapolation properties up to 10 8 turns have been studied, but without comparing against tracking data, which is nowadays still impossible for such a large number of turns in the case of the LHC ring. Therefore, in this section a different aspect is considered, which consists of benchmarking the performance of the extrapolation against numerical data by choosing a suitable number of turns that can be simulated without too many issues.
In Fig. 8, the results of the DA extrapolation obtained from the proposed models built using different number of turns for the Hénon map are shown. Model 2, 3, and 4 are reported as a function of the parameter and the error bar are obtained from the fit procedure. The numerical simulations carried out at 10 7 turns are considered the reference and are reported without any error bar as no fitting and extrapolation are involved.
The first observation is that the overall behaviour does not depend on the DA model used. For ≤ 20, the extrapolated DA values approximate the reference ones from below and the agreement is very good and weakly depends on the number of turns used to build the DA models. In several cases the reference DA values are compatible with the extrapolated ones within the error bars. Note that the relative agreement between reference and extrapolated DA is better than ≈ 5%. The conclusions change whenever the range ≥ 20 is considered. In fact, a stronger dependence on is observed and the extrapolated DA values approach the reference ones from above. A larger discrepancy between extrapolated values and references ones is observed, the maximum reaching ≈ 30% for the largest values of . In any case, the extrapolation based on 10 6 turns never differs by more than ≈ 15% from the reference. This can be considered as a very encouraging result for a time extrapolation by two orders of magnitude.
Similar analyses have been carried out for the LHC cases, applying a similar approach as that used for the Hénon map. The results are summarised in Fig. 9 where the proposed models (2, 3, and 4) are presented (top to bottom). In the two columns the results obtained for different amount of numerical data (10 4 -left -10 5 -right) are shown. The outcome of the analyses for all the sixty realisations of the LHC ring are reported and the behaviour of the numerical data used to build the DA models is shown together with the extrapolated curves. The reference case is provided by the results of numerical simulations up to 10 6 turns.
The results presented in both columns approximate the reference data from above, showing that the extrapolated DA is an optimistic estimate of the numerical one. As expected, the larger is the number of turns used to build the DA models the smaller is the discrepancy between extrapolated and reference values. The quantitative data are reported in Table 4. Table 4: DA values for Configuration B for Beam 1 as obtained from the numerical data or by using the three new models discussed in this paper, which provide DA values extrapolated to 10 6 turns, but starting from a different number of turns used to build the models. The approach used to derive the errors reported in the The disagreement between extrapolated DA values and the reference ones ranges from 2% to 4%, which is an excellent result. No strong dependence of the extrapolated DA value on the model used has been observed as for the case of the Hénon map. As a consequence, extrapolation for two orders of magnitude in turn number might be at hand also for the LHC. Of course, it is not possible to ensure that such a situation is typical for all possible configurations of the LHC (similarly to what has been observed for the Hénon map, where the degree of accuracy of the extrapolation features a non-negligible dependence on the value of the parameter .
6 Digression: some intriguing properties of the DA models The DA models provide values of the parameters ρ * or b (see Eqs. (20) and (21)) and κ. It is of interest to verify whether the assumed relationship (4b) is confirmed by the data. To this aim, the dependence of b on κ has been studied. The results for Model 2 are shown in Fig. 10, where the Hénon map and LHC models are reported in the upper and lower part of the plot, respectively.
The striking observation is that, in spite of the essential differences in the four LHC rings' configurations, all data seem to lie on a single smooth curve. The same holds also for the Hénon map case, where the differences in the value of do not prevent the data points to lie on a single smooth curve. Furthermore, the functional form of the curve is the same for the Hénon and LHC cases, showing a sort of universal behaviour. Note also that these features are model independent, as they hold also for Model 3 and 4.
It is worthwhile making two additional remarks. Firstly, the difference between Beam 1 and 2 in terms of the parameters b and κ is not unexpected. In fact, it was already observed in earlier work [40] that the DA for 10 5 turns was not the same for the two rings. This is linked to the fact that the magnetic field errors are rather correlated between the two ring, but they are not exactly the same. Secondly, the parameter κ varies although the phase space dimension is not changing: this is yet another indication that further theoretical work is needed to understand the dependence of κ on the features of the system.
Several functions have been considered to describe the observed dependence, namely with R 2 adj as figure of merit. All listed functions provide reasonably good fit results, with model (30c) generally giving the best fit and model (30a) being a close second. However, for model (30c) we always obtain γ ≈ 1, hinting that model (30a) might be the correct underlying relation. The most important observation, however, is that the function (30d) always provided the worst fit, despite it being expected to be the correct underlying relation (see Eq. (21)). This led us to conjecture that the function (30a) represents the correct underlying relation between b and κ.
It is worth noting that tests adding constant terms to the listed fit functions were carried out. Once more, it was found that although the constant term helps in improving the fit quality, it makes the values of the other fit parameters entering in the κ-dependent term more variable when applied to the several systems under consideration. For this reason, it has been considered that such a constant term introduces nonphysical features and it has been dropped. The resulting fit parameters for model (30a) for the various configurations studied are listed in Table 5. The error bars on the fit parameters for the four LHC cases are in general larger than those for the Hénon map, which is a consequence of the presence of the sixty realisations for each LHC configuration. It is also clear that the parameters for each beam are relatively close together, in spite of the differences in magnetic configurations. In general, the fit parameters for Model 3 and 4 are very similar between them, while those for Model 2 are more different. This is a possible indication that the observed scaling of b with κ is better detected by means of the more accurate models. Another observation is that the fit parameters of the combined data for all four LHC configurations do not depend strongly on the model used to describe the DA data.
We can now use Eq. (30a) to redefine the proposed DA models such that the different fit parameters are supposed to be independent of each other. For convenience, we rename the fit parameters from model (30a) as follows: such that we can rewrite (30a) asr * = r * B κ , Note that, due to the dependency of r * on κ, bothr * and B are independent on it. Furthermore, we definẽ which can be computed, using Eq. (4c), to beÑ Finally, this gives the following reformulation for our new DA models: whereÑ 0 can either be left as a free model parameter, or can be fixed to the value given in Eq. (34) for Models 3 and 4, or to an arbitrary constant for Model 2.
Note that these new formulations of the DA models introduce an extra fitting parameter, namely B, but are supposed to be even more stable from a physical viewpoint. In other words, the parametersb, B, and κ are expected to be totally independent of each other, hence true constants of the scaling law. Of course, this is so far merely an empirical observation. Some efforts should be devoted to the analysis of the form of the estimate of the stability time provided by Nekhoroshev theorem to determine whether the numerically-obtained scaling law can be justified with theoretical arguments. In other words, an interesting open question is whether a theoretical motivation can be found to rewrite the Nekhoroshev estimate in Eq. (3) as wherer * , B, and κ are independent constants, andÑ 0 is given by Eq. (34). Another interesting question is whether an analytical estimate can be found for B, as our data suggests it to be constant for all different systems we investigated. Indeed, Table 5 seems to hint that 0.05 ≤ B ≤ 0.14 .
(37) As a last remark, we note that it is straightforward to retrieve the original DA model formulations from Eqs. (35) by simply making the following substitutions:

Conclusions
In this paper, recent progress in defining reliable models for the time dependence of the DA has been presented and discussed in detail. The essence of the novel approaches relies on the analytic estimates used in the proof of the Nekhoroshev theorem [26,27] and on the use of the Lambert function. Such a function is applied to invert the estimate of the stability time with an exact and closed-form expression. Three new models for the DA evolution with time have been proposed: Model 2 resembles the original Model 1 [17,18] for the inverse logarithmic term, but does not include the constant term representing the dynamic aperture for infinite time, which was introduced to take into account the region filled by KAM tori. Model 4 represents the exact model for DA evolution with time based on Nekhoroshev theorem, only, and using the Lambert function. Model 3 is derived from Model 4 by means of a series development of the Lambert function, which can provide an easier computational tool as it avoids the use of the exotic Lambert function. Moreover, it shows explicitly the resemblance and the new features of Model 3 with respect to Model 2. Of course, the validity of Model 3 relies on the validity of the expansion of the Lambert function, which remains a topic deserving further theoretical investigations.
In earlier work it had been observed that in some cases the parameters of the original Model 1, based on KAM theory and Nekhoroshev theorem, become negative [17,18]. This represents a violation of the conditions of validity of the Nekhoroshev theorem, which then makes Model 1 only a phenomenological description of the special cases with negative parameters. The three new models proposed in this paper overcome this difficulty as they provide positive physical parameters values in situations in which Model 1 is failing to do so. Further, only a small reduction of fit quality is obtained, despite the reduction of the number of model parameters from three to two, which suggests a more fundamental scaling law has been found.
The observed dependencies between the parameters of Model 1 have been reduced in the new models. Furthermore, the behaviour of the models parameters has been studied as a function of the modulation amplitude , for the case of the 4D Hénon map, and of the realisations of the magnetic field errors, for the case of the LHC, and a very smooth and regular behaviour has been observed. This feature opens the possibility to study the underlying mechanisms and how b and κ depend on the physical parameters of the system, rather than just considering the dynamic aperture. This could provide a more fundamental insight into the beam dynamics and aid effective collider design based on b and κ rather than on DA computation at a fixed number of turns.
A by-product of the detailed analysis carried out has been the observation that the b parameter features a clear exponential dependence on the κ parameter. This implies that the functional form of the stability-time estimate as given in [26] could be reviewed. Additional theoretical efforts shall be devoted to this intriguing result. Furthermore, the relationship between κ and the phase space dimension or other dynamical features of the system should be studied in more detail to clarify the results presented and discussed in this paper.
The main issue of the original model has been identified in the functional form proposed, namely with a constant term added to a logarithmic one that depends on two model parameters: this combination provides a flexible functional form that can match a large variety of DA data, however, compensations between the model parameters is possible and affects the stability and predictivity of the DA model.
As far as the possibility to use the models to interpolate numerical data and then to make predictions beyond the maximum number of simulated turns, Models 2, 3, and 4 proved to be very reliable and more precise than Model 1 for the cases presented in this paper. More quantitatively, the predictive power has been probed for the dynamical systems considered in this study and it has been found out that extrapolations by two orders of magnitude in number of turns can be done with inaccuracy in the DA estimate not larger than 4% for the LHC case. This is certainly a very positive result that suggests that the use of these models for extrapolating DA beyond what is currently possible to compute by means of numerical simulations is indeed a viable option.
Finally, it is worth stressing that these recent and very encouraging results will be used to refine the approaches proposed in earlier work to estimate beam losses [19] and more recently to model the evolution of the collider's luminosity in the presence of burn off and losses due to dynamic aperture [28,29].

A General properties of the W function
If we want to invert Eq. (3) in order to interpret r as the dynamic aperture and express its evolution as a function of the number of turns, we are undoubtedly left with a result involving the Lambert W function, see, e.g. [30] and references therein for an overview on this function and its application in physics as well as [31] for a recent application to accelerator physics, so we devote this section to the investigation of this rather exotic, but nevertheless ubiquitous, function.
The Lambert W function, also called the Ω function or product logarithm, is in fact a set of functions defined as the branches of the inverse of the product exponential function, namely By analytic continuation, W(z) is well-defined, but multi-valued on the full complex plane, with a branch cut along the negative axis at −∞, − 1 e . Its defining equation for any z ∈ C is z = W(z) exp{W(z)} .
By using the very definition (40) it is possible to show that . (41b) Its derivative and primitive are the same for all branches and are given by Restricting ourselves to the real domain, there are only two branches of W, namely dom (44b) These two branches are shown in Fig. 11. A few other interesting identities, valid on the real axis, are:

B Parameters' constraints
We investigate what constraints can be imposed on the parameters of our model. We have two general constraints: first we demand that the dynamic aperture is real and positive, and second we have to remain in the region where the Nekhoroshev estimate is valid, as in Eq. (2).

B.1 Reality Condition
For the first demand we have to investigate the argument of the Lambert W function, and the base of the power −κ. Indeed, if κ is non-integer the result will be real only if the base of the power is positive. First of all, r * is a positive quantity, which ensures that r > 0 as well as N 0 . Furthermore if λκ < 0, the argument of W is positive and hence we use the upper branch W 0 , which is positive itself, thus ensuring the reality of the power, namely On the other hand, if λκ > 0, the argument of W is negative and hence we can use both branches, as both return a negative value for a negative argument, again ensuring the reality of the power. However, we have to make sure that the argument lies in the region − 1 e , 0 to avoid entering a complex branch. In other words:

B.2 Region Constraint
Next we have to satisfy Eq. (2). In other words, we have: To get rid of the power, we raise each side to 1 κ . The result will hence depend on the sign of κ. Note that we already deduced that the base of the l.h.s. is positive. If κ > 0 the base of the r.h.s. is positive as well and we can cancel the two powers without changing the ordering In order to continue our investigation of the restrictions on the parameters, we would like to apply the inverse of the Lambert W function, namely the product exponential. It is however not trivial if and how this function preserves a given ordering. If we have a look at Fig. 11, we see that x exp{x} rises monotonously towards +∞ for x ≥ −1, and falls monotonously from 0 for x ≤ −1. From this information we can derive the following properties: If λ > 0 and κ > 0 then both Eqs. (47) and (49) have to be satisfied. From the former we deduce that the argument of the Lambert W function is negative and hence the function itself is negative. To see which one of Eqs. (50) applies, we investigate two positive regions of λ, namely Note that the combination of (51a) and (49) implies automatically that we have to choose the branch W −1 , as this is the only (real) branch that has values smaller than −1. Let us again investigate the two cases separately.
This implies that now both the l.h.s. and the r.h.s. of (49) are smaller than −1, and hence ordering (50b) applies: Multiplying both sides with a factor −λκ flips the ordering, while raising both sides to the power λκ does not and finally one finds Additionally we should satisfy the reality constraint in (47), namely: We notice that for λ = 3 2 , Eq. (54) reduces to Eq. (53), while it is easy to verify that ∀κ > 0: e λκ λκ ≤ 2 3κ hence we can safely assume that the reality constraint is automatically satisfied if we impose Eq. (53).

λ > 3/2
This implies that the r.h.s. of (49) is larger than −1, which does not set any constraint on the l.h.s. If we choose the real branch W −1 , the l.h.s. will be smaller than −1 and hence ordering (50d) applies, and we do not need to simplify (49) any further as it is automatically satisfied. In this case the reality constraint in Eq. (54) is the only one that remains.
On the other hand, if we choose the real branch W 0 , the l.h.s. will be larger than −1 and hence ordering (50a) applies, and we can simplify (49) into Multiplying both sides with −λκ changes the ordering, while raising both sides to the power λκ does not and finally one finds However, we also have to satisfy the reality condition, which combined with the previous constraint, provides bounds from below and from above to N , namely e λκ It is clear that we do not want an upper bound for N/N 0 as this would limit the validity of the Nekhoroshev stabilitytime estimate. Therefore we conclude that only the branch W −1 should be used in our application and the summary of all possible parameters' values is listed in Table 1.   Table 3.      Figure 11: Plot of the product exponential function (full line) and the two real branches of its inverse, the Lambert W 0 and W 1 functions (dashed and dotted lines, respectively).