Numerical calculation of high-order QED contributions to the electron anomalous magnetic moment

This paper describes a method of numerical evaluating high-order QED contributions to the electron anomalous magnetic moment. The method is based on subtraction of infrared and ultraviolet divergences in Feynman-parametric space before integration and on nonadaptive Monte Carlo integration that is founded on Hepp sectors. A realization of the method on the graphics accelerator NVidia Tesla K80 is described. A method of removing round-off errors that emerge due to numerical subtraction of divergences without losing calculation speed is presented. The results of applying the method to all 2-loop, 3-loop, 4-loop QED Feynman graphs without lepton loops are presented. A detailed comparison of the 2-loop and 3-loop results with known analytical ones is given in the paper. A comparison of the contributions of 6 gauge invariant 4-loop graph classes with known analytical values is presented. Moreover, the contributions of 78 sets of 4-loop graphs for comparison with the direct subtraction on the mass shell are presented. Also, the contributions of the 5-loop and 6-loop ladder graphs are given as well as a comparison of these results with known analytical ones. The behavior of the generated Monte Carlo samples is described in detail, a method of the error estimation is presented. A detailed information about the graphics processor performance on these computations and about the Monte Carlo convergence is given in the paper.


I. INTRODUCTION
The electron anomalous magnetic moment (AMM) is known with a very high accuracy.In the experiment [1] the value a e = 0.00115965218073 (28)   was obtained.So, an extremely high precision is required also from theoretical predictions.
The most precise prediction of electron's AMM at the present time uses the following representation: (m e /m µ ) + A (m e /m τ ) + A (m e /m µ , m e /m τ ), where m e , m µ , m τ are masses of electron, muon, and tau lepton, respectively.Different terms of this expression were calculated by different groups of researchers.Some of them has independent calculations, but some of them were calculated only by one scientific group.The best theoretical value [2]   a e = 0.001159652182032(13)(12)(720) (1)   was obtained by using the fine structure constant α −1 = 137.035998995(85)that had been obtained by using independent from a e methods (see [2]).Here, the first, second, and third uncertainties come from A , a e (hadronic) + a e (electroweak) and the fine-structure constant 1 respectively.The values A (2) 1 = 0.5, A (4) 1 = −0.328478965579193 . . ., A 1 = 1.181241456 . . ., A 1 = −1.9122457649 . . .are known from the analytical and semianalytical results in [3, 4], [5, 6], [7], [8], respectively 2 .The value A (10) 1 = 6.675 (192)   was presented in [2].At the present time, there are no independent calculations of A was evaluated independently 3 in [20-22] (and for the graphs without lepton loops in [23]).We must take into account the fact that the contributions of some individual graphs turn out to be several 1 So, the calculated coefficients are used for improving the accuracy of α . 2 The value for A (6) 1 was a product of efforts of many scientists.See, for example, [9-19]. 3However, by 2016 most part of A (8) 1 had been calculated by only one scientific group [20].
times greater than the total contribution in absolute value 4 .Therefore, an error in one graph evaluation can make the final result to be entirely wrong.So, the problem of evaluating A (2n) 1 is still relevant.The most uncertain and difficult for evaluation QED contributions to a e correspond to Feynman graphs without lepton loops.We consider an evaluation of these contributions in this paper and denote the n -loop part of it by A (2n) 1 [no lepton loops] .This paper is a continuation of the series of papers [24, 23] with increasing precision, number of independent loops in graphs, refinement of the consideration.
We use the subtraction procedure for removing both infrared and ultraviolet divergences that was introduced in [24].It is briefly described in Section II.This procedure eliminates IR and UV divergences in each AMM Feynman graph point-by-point, before integration, in the spirit of papers [2, 25-35]  etc.This property is substantial for many-loop calculations when reducing the computer time is of critical importance.Let us note that A (2n) 1 is free from infrared divergences since they are removed by the on-shell renormalization as well as the ultraviolet ones (see a more detailed explanation in [24]).However, the subtractive on-shell renormalization can't eliminate IR divergences in Feynman-parametric space before integration as well as it does for UV divergences 5 .The structure of IR and UV divergences in individual Feynman graphs is quite complicated 6 .Therefore, a special procedure is required for removing both UV and IR divergences.Let us recapitulate the advantages of the developed subtraction procedure.
1.It is fully automated for any n .
2. It is comparatively easy for realization on computers.
3. It can be represented as a forestlike formula.This formula differs from the classical forest formula [32, 33, 36] only in the choice of linear operators and in the way of combining them.

The contribution of each Feynman graph to A (2n) 1
can be represented as a single Feynman-parametric integral.The value of A (2n) 1 is the sum of these contributions. 5. Feynman parameters can be used directly, without any additional tricks.
See a detailed description in [24].The subtraction procedure was checked independently by F. Rappl using Monte Carlo integration based on Markov chains [21].An additional advantage of the procedure is described below and in Section IV.H.After the subtraction is applied, the problem is reduced to numerical integration of functions of many variables.The number of variables can be quite big 7 , this fact compels us to use Monte Carlo methods.In most cases the precision of Monte Carlo integration behaves asymptotically as C/ √ N , where N is the number of samples.Thus, for reaching a high precision in practical time it is very important to decrease the constant C as much as possible.Unfortunately, the behavior of Feynman-parametric integrands that appear in A (2n) 1 computation often leads to slow Monte Carlo convergence.An integration method with a relatively good constant C was introduced in [23].The method is based on importance sampling with probability density functions that are constructed for each Feynman graph individually.The construction is based on Hepp sectors [31] and uses functions of the form that was first used by E. Speer [37] with some modifications.The modification is based on the concept of I-closure that was introduced in [23].The method from [23] demonstrated better convergence than the universal Monte Carlo routines.A refined version of the construction is described in Section III.This refinement reduces the uncertainty of A (8)   1 [no lepton loops] by about 15% when the number of samples is fixed.
When we have a deal with unbounded functions or with functions having sharp peaks, the standard Monte Carlo error estimation approach has a tendency to underestimate the inaccuracy.A method of preventing underestimation was described in [23].However, some tests show that in many cases a more accurate consideration of peaks is required.An improved method of error estimation that uses a specificity of the considering integrands is presented in Section IV.F.A detailed information about samples behavior for the 5-loop and 6-loop ladder graphs is provided.Also, an information about the dependence of the results on number of samples is given for A (8)   1 [no lepton loops] and for the 5-loop and 6-loop ladder graph contributions.
Numerical subtraction of divergences leads to the situation when small numbers (in absolute value) are obtained as a difference of astronomically big numbers.This generates round-off errors that significantly affect the result8 .To control these errors we need to use additional techniques that substantially slows down the computation speed.In [24] all integrand evaluations were first performed with two different precisions 9 , and when the difference of the results was noticeable, the calculation was repeated with increased precision.This approach requires twice as much computer time than the direct calculation.Also, an emergence of bias is possible in this case.All calculations that are described in [23] use the interval arithmetic 10 .The interval arithmetic is reliable but slows down the computation many times: for example, a multiplication of two intervals requires 8 number multiplications with correct rounding, 3 minimums and 3 maximums.To eliminate this slowdown a special modification of the interval arithmetic was developed.This technique gave a significant improvement in computation speed without loss of reliability.In many cases this method works faster than the approach with two precisions 11 .A specificity of the construction of the integrands is used for reaching such a performance.The description of this technique is contained in Section IV.C.
Rapid development of specialized computing devices that solve some tasks many times faster than ordinary computers makes it possible to use them for scientific calculations.All Monte Carlo integrations that are described in this paper were performed on one 12 graphics processor of NVidia Tesla K80.Graphics processors (GPUs) are very useful for Monte Carlo integration.However, a specific programming is required to use these devices effectively.Sections IV.A, IV.D, IV.I contain some information about the realization of the described integration method on GPU.
The developed method and realization were applied for computing A (2n) 1 [no lepton loops] , n = 2, 3, 4 .Also, the contributions of the 5-loop and 6-loop ladders were evaluated for testing purposes.The results are presented in Section IV.A.The comparison with known analytical results is provided in Table XVII.
High-order calculations in Quantum Field Theory require performing some operations with enormous amounts of information.For example, the total integrand code size 13 for A (8)   1 [no lepton loops] is 2.5 GB.There are too many places where a mistake can emerge.However, the total independent check requires a lot of resources.So, it is very important to have a possibility to check the results by parts by using another methods.Section IV.H demonstrates that the developed method provides such a possibility.The total set of 269 Feynman graphs for A (8)   1 [no lepton loops] is divided into 78 subsets, the contribution of each set must coincide with the contribution that is obtained by direct subtraction on the mass shell in Feynman gauge.The contribution of each set is provided in Section IV.H.Also, an analogous information is given for the 2-loop and 3-loop cases, the comparison with known analytical results is provided as well as it has been done in [24].The contributions of 6 gauge invariant classes of 4-loop graphs without lepton loops are presented in Section IV.H and compared with the semianalytical ones from [8].Knowing the values of contributions of gauge invariant classes gives an ability to check some hypotheses from Quantum Field Theory14 .Section IV.G contains the detailed information about contributions of individual Feynman graphs including the influence of round-off errors and the information about Monte Carlo error estimation.The summary of the results and the technical information about GPU performance and code sizes is presented in Section IV.I.

II. SUBTRACTION OF DIVERGENCES
We will work in the system of units, in which = c = 1 , the factors of 4π appear in the fine-structure constant: α = e 2 /(4π) , the tensor g µν is defined by , the Dirac gamma-matrices satisfy the condition γ µ γ ν + γ ν γ µ = 2g µν .We will use Feynman graphs with the propagators for electron lines and for photon lines.We restrict our attention to graphs without lepton loops.However, the developed subtraction procedure works for graphs with lepton loops as well [24].
The number ω(G) = 4 − N γ − 3 2 N e is called the ultraviolet degree of divergence of the graph G .Here, N γ is the number of external photon lines of G , N e is the number of external electron lines of G .
Two subgraphs are said to overlap if they are not contained one inside the other, and their sets of lines have a non-empty intersection.
A set of subgraphs of a graph is called a forest if any two elements of this set don't overlap.
For a vertexlike graph G by F[G] we denote the set of all forests F consisting of UV-divergent subgraphs of G and satisfying the condition G ∈ F .By I[G] we denote the set of all vertexlike subgraphs G of G such that G contains the vertex that is incident 16 to the external photon line of G . 17 We will use the following linear operators that are applied to the Feynman amplitudes of UV-divergent subgraphs: 1.A is the projector of AMM.This operator is applied to the Feynman amplitudes of vertexlike subgraphs.See the definition in [24, 23].
2. The definition of the operator U depends on the type of UV-divergent subgraph to which the operator is applied: is the Feynman amplitude that corresponds to an electron self-energy subgraph, then, by definition 18 , then, by definition, 3. L is the operator that is used in the standard subtractive on-shell renormalization of vertexlike subgraphs.If Γ µ (p, q) is the Feynman amplitude that corresponds to a vertexlike subgraph, ( 5) is satisfied, then, by definition, Let f G be the unrenormalized Feynman amplitude that corresponds to a vertexlike graph G .Let us write the symbolic definition where In this notation, the subscript of an operator symbol denotes the subgraph to which this operator is applied.
The coefficient before γ µ in fG is the contribution of G to a e .See the examples of applying the procedure in [24, 23].

III. PROBABILITY DENSITY FUNCTIONS FOR MONTE CARLO INTEGRATION
We use Feynman parameters for calculations.Thus, to obtain the contribution of a graph G we need to calculate the integral z 1 ,...,zn>0 where the function I is constructed by using the known rules [24].We use the Monte Carlo approach based on importance sampling: we generate randomly N samples z 1 , . . ., z N , where z j = (z j,1 , . . ., z j,n ) , using some probability density function g(z) and approximate the integral value by 1 The density g is fixed for a fixed graph G .The speed of Monte Carlo convergence depends on selection of g .A construction of G that gives a good convergence is described below.
We will use Hepp sectors [31] and functions of the form that was first used by E. Speer [37] with some modifications.All the space R n is split 19 into sectors.Each sector corresponds to a permutation (j 1 , . . ., j n ) of {1, 2, . . ., n} and is defined by We define the function g 0 (z 1 , . . ., z n ) on S j 1 ,...,jn by the following relation where Deg(s) > 0 is defined for each set s of internal lines20 of G except the empty set and the set of all internal lines of G .The probability density function is defined by A fast random samples generation algorithm for a given Deg(s) is described in [23].
Let us describe the procedure of obtaining Deg(s) .The following auxiliary definitions repeat the ones from [23].By definition, put where |x| is the cardinality of a set x , e(s) is the set of all electron lines in s , N L (s) is the number of independent loops in s .If s is the set of all internal lines of a subgraph of G , then ω(s) coincides with the ultraviolet degree of divergence of this subgraph that is defined above.By IClos(s) we denote the set s ∪ s , where s is the set of all internal photon lines l in G such that s contains the electron path in G connecting the ends of l .The set IClos(s) is called the I-closure of the set s .
A graph G belonging to a forest and G ∈ F then by G /F we denote the graph that is obtained from G by shrinking all childs of G in F to points.
We also will use the symbols ω , ω for graphs G that are constructed from G by some operations like described above 21 and for sets s that are subsets of the set of internal lines of the whole graph G .We will denote it by ω G (s) and ω G (s) , respectively.This means that we apply the operations ω and ω in the graph G to the set s that is the intersection of s and the set of all internal lines of G .
Electron self-energy subgraphs and lines joining them form chains l 1 G 1 l 2 G 2 . . .l r G r l r+1 , where l j are electron lines of G , G j are electron self-energy subgraphs of G .Maximal (with respect to inclusion) subsets {l 1 , l 2 , . . ., l r+1 } corresponding to such chains are called SE-chains.The set of all SE-chains of G is denoted by SE [G] .
Suppose a graph G is constructed from G by operations like described above; by definition, put (it is important that here we consider the SE-chains of the whole graph G ).
By F max [G] we denote the set of all maximal forests belonging to F[G] (with respect to inclusion). Let , if s contain all electron lines of G, where This formula for Deg(s) differs from the one that was defined in [23] and gives better Monte Carlo convergence, if appropriate values for constants are 21 See the corresponding examples in [23].
taken.For good Monte Carlo convergence we can use the values C bigZ = 0.256, C bigF = 0.839, C add = 0.786, These values were obtained by a series of numerical experiments on 4-loop Feynman graphs.See the examples for the considered combinatorial constructions in [23].

IV. REALIZATION AND NUMERICAL RE-SULTS
A. Overview The computation on one GPU of NVidia Tesla K80 that was leased from Google Cloud 22 showed the following results ( 1σ limits 23 ): 1 [no lepton loops] = 0.90485 (10), A 1 [no lepton loops] = −2.181(10), the corresponding computation times 24  We reduce the number of integration variables by one using the fact that each integrand I(z 1 , . . ., z n ) depends linearly on z a when z a + z b is fixed, where a and b are the electron lines that are incident to the vertex that is incident to the external photon line, see [23]  25 .In contrast to [24, 23], we use a nonadaptive Monte Carlo algorithm.The absence of adaptivity simplifies a realization on GPU and allows us to undertake an analysis of the Monte Carlo samples behavior, see Section IV.F. 22using the free trial 23 See Section IV.F 24 d=days, h=hours, min=minutes 25 and also [38]  The D programming language [39] was used for the generator of the integrands code.The integrands and the Monte Carlo integrator were written in C++ with CUDA [40].The integrand code sizes are presented in Table XVII.The pseudorandom generator MRG32k3a from the CURAND library [41] was used for the Monte Carlo integration.
The integrand values are evaluated first using double-precision 26 floating point operations that are fully supported on the GPU.If the double-precision operations do not give enough accuracy, the calculations are repeated using arbitrary-precision floating point operations with increasing precision, see the details in Section IV.D.
All the integrand code is divided into shared libraries that are linked dynamically with the integrator.Each Feynman graph and type of arithmetic corresponds to one or several shared libraries.Each of these shared libraries contains CUDA kernels 27 and functions for calling them.For reducing the compilation time 28 without losing the computation performance the size of the integrand CUDA kernels is set at approximately 5000 operations.Also, for reducing the compilation time each arbitrary-precision shared library contains no more than 10 CUDA kernels.
The memory speed is a weak spot of GPU computing.So, the integrand GPU code is organized in such a way that the most of the operations are performed with the GPU register memory: we are trying to minimize the number of the used variables, often to the detriment of the arithmetic optimization.
To use the GPU parallel computing effectively we divide the Monte Carlo samples for one Feynman graph into portions.Each portion contains from 10 6 to 10 8 samples.First, we generate the samples of a given portion and calculate the corresponding integrand values in the fastest precision.After that, the samples requiring an increased precision are collected and calculated.Each CUDA kernel is launched on GPU in 19968 parallel threads 29 .To reduce the impact of the latency of CUDA kernel calling each thread performs approximately 15 samples sequentially in a loop. 26double-precision = 64 bit 27 CUDA kernel is a function in a program that is executed many times in parallel on GPU and is called from the CPU part, see [40].
28 GPU device code is compiled very slowly, and the compilation time increases rapidly with the size of functions.

B. Interval arithmetic
The interval arithmetic is an easy and reliable way for controlling round-off errors.In this way all calculations are performed with intervals, not with numbers.Arithmetic operations on intervals are defined in such a way that each exact intermediate value x is quaranteed to be in the corresponding interval [x − ; x + ] .One can use the following definitions: where ( * ) up and ( * ) down means the operation ( * ) with rounding up (to +∞ ) or down (to −∞ ).The most of modern GPUs 30 support specifying rounding mode for arithmetic operations and working with infinities for handling overflows.Addition, subtraction and multiplication can be realized directly by using the formulas proposed above 31 .However, for division it is 30 as well as CPUs 31 Also, these formulas will work correctly with Not A Numbers (NANs) despite the fact that the NVidia realization of min and max ignores NANs in the lists of arguments.
required to perform additional operations for handling divisions by zero and overflows.This does not slow down the computation because the amount of divisions in the integrand constructions is very small.

C. Elimination of an interval arithmetic
The direct interval arithmetic is a very slow thing.However, there are many ways of increasing speed by weakening the distinctness of the intervals.
We will use the following specificity of the integrands construction.It is known [24] 32 how to construct the integrand for a given graph G from the building blocks , where G is a graph that can be obtained from a subgraph of G by shrinking some subgraphs to points, a, b are internal electron lines of G , j = 1, 2 , V G is defined through a sum over 1-trees of G , Q G a,j through a sum over 1-trees 33 passing a , B G ab through a sum over trees with cycle passing a, b , S G through a sum over 2-trees.See the full definitions in [24].The construction rules described in [24] give us to observe that for a high number of independent loops in G the most part of the integrand computation is the calculations of polynomials of that is constructed as a sequence of additions, subtractions and multiplications.The main ideas of the interval arithmetic elimination are: • we can calculate the center of the resulting interval in the direct doubleprecision arithmetic using the same polynomial applied to the centers of [x − j ; x + j ] ; • the radius of the resulting interval can be estimated as a function of x − j , x + j that is much more simple than the source polynomial.
We will use the following inequality about the machine double-precision arithmetic 34 : where x rnd corresponds to the machine representation of x rounded in any direction. 32See also [34, 35, 38]. 33More precisely, paper [24] has a definition of QG a , the values Q G a,j can be defined by QG in terms of paper [24]. 34The last term corresponds to the case when a very small number is converted into zero after rounding.Let x j be the exact values corresponding to the intervals [x − j ; x + j ] , j = 1, . . ., n .By x n+1 , . . ., x l we denote the exact intermediate values that are obtained sequentially when we calculate the value of the needed polynomial.To each j = 1, . . ., n we assign a type t j : 35 ).Let us define the numbers x appr j , M j , ε j , j = 1, . . ., l , satisfying the following conditions for all j : We define them by using the following rules: ) rnd , j = 1, . . ., n (thus, x appr j are the centers of the corresponding intervals; the machine double-precision arithmetic guarantees that we always have x − j ≤ x appr j ≤ x + j if an overflow does not occur); • M j are defined for j = 1, . . ., n by • ε j are defined for j = 1, . . ., n by • if x j is obtained as x k * x r , where * is the addition, subtraction or multiplication, j = n + 1, . . ., l , then x appr j = (x appr k * x appr r ) rnd (thus, x appr j are obtained by the direct double-precision arithmetic without specifying the rounding mode36 ); • analogously, (M j , ε j ) is defined by for addition and subtraction, and by It is easy to see that for the final l the value ε l can be expressed as a polynomial P (M t=0 , M t=1 , ε) with positive coefficients in only three variables, where M t=a = max Thus, the value of ε l can be obtained directly using the coefficients of this polynomial without calculating the intermediate values M k , ε k .However, the polynomial can still have many coefficients and therefore can require a lot of arithmetic operations for computation.We estimate P by another expression in the following way.Let us split P into four parts P 0 , P 1 , P 2 , P 3 by the following rules: P 0 : C u,v,w < 2 −100 , P 1 : 2 −100 ≤ C u,v,w < 0.5, Thus, P = P 0 + P 1 + P 2 + P 3 .By definition, put where Let us define v − j , v + j , w − j , w + j in analogous way.Put It is obvious that P j ≥ P j .So, we can use P = P 0 + P 1 + P 2 + P 3 as a radius of the final interval, if it is calculated by machine arithmetic operations with rounding up 37 .P is much simpler for calculation than P .Thus, an interval for the final value may be38  We split P into four polynomials in such a way guiding the following considerations: • P 3 contains the most of the coefficients sum; however, its contribution in P 3 will be compensated by the multiplier ε 2 (when ε is near zero); • P 2 has a big sum of coefficients too; however, it is much less than P 3 has; this sum will be compensated by the multiplier ε in P 2 ; • P 1 has a little sum of coefficients; however, in some cases P 1 can be noticeable; thus, we separate P 1 from P 0 to minimize the contribution of the max • max • max part in the definition of P 1 ; • the contribution of the coefficients of P 0 is always small.

D. Algorithm of obtaining accurate integrand values
We obtain the value39 I(z)/g(z) from (11) first by the eliminated interval arithmetic from Section IV.C.If the obtained interval [y − ; y + ] does not satisfy the condition y + − y − ≤ σ/4 , where σ is the current error estimation 40 for the obtained integral value, we repeat the calculation in the direct doubleprecision interval arithmetic.If it is not enough, we reiterate this calculation in the interval arithmetic based on floating-point numbers with 128-bit mantissa and with 256-bit mantissa (if needed).If the 256-bit mantissa precision is not enough, we suppose that the value equals 0 .
The arithmetic with 128-bit mantissa is realized on GPU in such a way that all operations are performed with the GPU register memory.The arithmetic with 256-bit mantissa works with the global GPU memory.The usage of the register memory improves the performance by about 10 times 41 .
We also use a routine for prevention of emerging occasional very large values that is analogous to the one described in [23], but adapted for GPU parallel computing.

E. Modified probability density functions
It is theoretically possible the situation when g(z) from ( 11) is very small, but the smallness of |I(z)| does not correspond to it.An emergence of such situations can make the Monte Carlo convergence worse.For patching it we use the probability density functions instead of ( 12), (13), where g 1 is defined by ( 12), ( 13), when the definitions from Section III are used, g 3 is defined by ( 12), ( 13), but with same Deg(s) = D , g 4 (z) = (n − 1)! (the uniform distribution).To generate a random sample with the distribution g(z) we should perform the following two steps: • generate randomly j = 1, 2, 3, 4 , where the probability of selecting j is C j ; • generate a sample with the distribution g j (z) .
The generation with the distribition g 2 (z) is the same as for distributions defined by ( 12), ( 13), but at the stage of sector generation we must take sectors with same probabilities, see [23].All computations are performed with the following values for the constants:

F. Monte Carlo error estimation
Let z 1 , . . ., z N be random samples, the formula ( 11) is used for Monte Carlo integration.By definition, put y j = I(z)/g(z) .The conventional error estimation approach is based on the following formula for the standard deviation: However, this formula has a tendency to underestimate the real standard deviation.Let us consider the 5-loop and 6-loop ladder examples.By definition, put maxlog = max j log 2 |y j | + 0.5 , let n k be the quantity of samples j such that maxlog and n j for the 5-loop and 6-loop ladders are presented in Table I. n k is an approximation for N p k , where p k is the probability that a sample is in the interval (15).We can see that the real standard deviation is highly dependent on the behavior of p j for j < 0 .For example, if p j+1 /p j < 4 for all j < j 0 then the standard deviation is infinite 42 .We will use the improved estimation 43 where is the contribution of the uncertainty of n k , peak is the contribution of the predicted behavior of p j for j < 0 that is described below 45 . 42Table I demonstrates that for the 6-loop ladder such a situation is quite possible. 43When we calculate deviation probabilities based on the standard deviation we use a presupposition based on the Central Limit Theorem that the distribution of N j=1 y j /N is close to the Gauss normal distribution.However, it is difficult to estimate the difference between the real distribution and the normal one.For example, the Berry-Esseen inequality uses the third central moment of random variables that is infinite if p j+1 /p j < 8 for all j < j 0 (Table I shows that this situation is quite possible for both 5-loop and 6-loop ladders). 44The definitions of σ ↓ and uncert repeat the ones from [23]. 45This procedure is a result of tests on different graph contributions to a e .It is developed for future calculations of contributions to a e of higher orders.It should not be treated as a universal procedure that works for all Monte Carlo integrations.However, a big value of σ ↑ /σ ↓ indicates that the obtained error estimation is unreliable.
The idea is to approximate n j by a geometric progression taking into account that n j are known with an uncertainty of about C √ n j and that p j+1 /p j changes with j .Put Here h j is an approximated value of log 2 (N p j ) , [h − j ; h + j ] is an interval for this value that is obtained taking into account that n j is known with uncertainty 46 .
We will estimate the absolute value of a difference between neighbour log 2 (p j+1 /p j ) by the value d , where where d jk is the distance from 0 to the interval For approximation of the sequence by a progression we will use another values for log 2 (N p j ) uncertainty that are obtained taking into account that errors for lesser j are more critical: For approximating the sequence of logarithms by a linear function kj + b let us introduce coefficients a l j , f l j , 2 ≤ l ≤ 20 , 0 ≤ j < l , for the least squares method 47 : 47 The explitit formulas are a l j = 12j−6(l−1) , .
for all l and x 0 , . . ., x l−1 .Put This formula takes into account both uncertainty of n j and shift of p j+1 /p j with j .We take max to prevent from excessive overestimation48 .Also, put , where we take l for which the minimum is achieved.Let us define peak by The meaning of this definition is that we use the formula for the sum of a geometric progression taking w instead of k − 2 .w is defined in such a way that w ∼ k − 2 as k → +∞ and w → 1/8 as k → −∞ .We use σ ↑ for all numerical results that are presented in this paper.
Tables II, III, IV contain the dependence of the error estimations and the real errors on numbers of samples N total for A

G. Contributions of individual Feynman graphs
The contributions of 2-loop and 3-loop Feynman graphs to A where m and m are numbers of internal photon lines to the left and to the right from the external photon line (or vice versa), k is the number of photons with the ends on the opposite sides of it.We do not give a picture for 4-loop graphs, but they are encoded in the tables as expressions of the form p; where p is the number of vertex that is incident to the external photon line, s j and f j are the ends of the j -th internal photon line, the vertexes are enumerated from 1 to 9 along the electron path, s j < f j , s 1 < . . .< s 4 .
The graphs are ordered lexicographically, and we guarantee that the code of a graph is the lexicographically minimal one.For example, the code of the graph from FIG. 2 is 3; 1 − 8, 2 − 7, 4 − 5, 6 − 9.

FIG 2. 4-loop Feynman graph: example
The contributions of the 4-loop graphs are presented in Tables VII, VIII, IX, X, XI, XII.The numbers of the graphs for which the contribution must coincide with the contribution obtained by the direct subtraction on the mass shell in Feynman gauge are marked by a star * , see Section IV.H.
The fields of the tables have the following meaning: • Value is the obtained value for the contribution with the uncertainty σ ↑ , see Section IV.F; • σ ↑ /σ ↓ is the relation between the improved standard deviation and the conventional one, see Section IV.F; • N total is the total quantity of Monte Carlo samples; • N fail EIA is the quantity of samples for which the Eliminated Interval Arithmetic from Section IV.C failed; • fail EIA is the contribution of that samples 49 ; • N fail IA is the quantity of samples for which the direct double-precision interval arithmetic from Section IV.B failed; • fail IA is the contribution of that samples; • N fail 128 is the quantity of samples for which the interval arithmetic based on numbers with 128-bit mantissa failed; • fail 128 is the contribution of that samples 50 .  4Sometimes this contribution can be many times more than the total 4-loop contribution!See, for example, graph 157 from Table IX.However, the Eliminated Interval Arithmetic significantly improves the computation performance, see Table XVII. 50Even these contributions can be noticeable.See, for example, graph 134 from Table VIII      with the known analytical results 51 and with the old results from [24] 52 .Table XV does not include one-element sets, these sets (individual graphs) are marked by a star in the tables containing individual contributions.The contributions of the gauge invariant classes (k, m, m ) (see the definition in Section IV.G) and their comparison with the semianalytical results from [8] are presented in Table XVI.
The equivalence of the subtraction procedure from Section II and the direct subtraction on the mass shell for all presented sets can be proved in a combinatorial way 53 .Let us consider an example: the set 26, 27 from Table XIV.The contribution of this set can be schematically written as Here, A , L , U are operators that are applied to Feynman amplitudes and return numbers, AΓ µ = eγ µ (A Γ µ ), LΓ µ = eγ µ (L Γ µ ), U Γ µ = eγ µ (U Γ µ ), the definitions from Section II are used, a constant multiplier is omitted.Analogously, the corresponding contribution that is obtained by the direct subtraction on the mass shell is It is easy to see that these expressions are equivalent.Let us consider another example: the set 11, 17 from Table XIV.The contribution of this set is 51 The big discrepancy for the set 14, 17 in Table XIV is probably caused by an unstable behavior of the pseudorandom generator MRG32k3a.The generator Philox 4x32 10 [41]  seems to work better on this set. 52The uncertainties in [24] correspond to 90% confidential limits (under the assumption that the probability distribution is Gauss normal).
53 if we do not consider the matter of divergence regularizations that must coincide with the values that are obtained by direct subtraction on the mass shell in the Feynman gauge, a comparison of these results with the known analytical values and with the old values from [24] (continued) a e = a e (QED) + a e (hadronic) + a e (electroweak),
Tables V and VI.The corresponding pictures are FIG. 3 and FIG. 4. The 4-loop graphs are split into gauge invariant classes (k, m, m ) ,

38 min and 8 h 24 min. All obtained results are in good agree- ment with the known analytical and semianalytical ones, see Table XVII. See also the detailed results in
Sections IV.G, IV.H, IV.I.

Table I .
Probability distributions for 5-loop ladder and 6-loop ladder

Table II .
Dependence of the estimated error and the difference between the obtained value and the known semianalytical one[8]on the number of Monte Carlo samples N total : A 32 × 10 11 −2.1807 0.0104 0.0086 −0.0038 1.21

Table III .
Dependence of the estimated error and the difference between the obtained value and the known analytical one[42]on the number of Monte Carlo samples N total : 5-loop ladder 911.6530 0.0058 0.0050 −0.0062 1.16

Table IV .
Dependence of the estimated error and the difference between the obtained value and the known analytical one[42]on the number of Monte Carlo samples N total : 6-loop ladder

Table V .
Contributions of individual Feynman graphs to A

Table VI .
Contributions of individual Feynman graphs to A

Table VII .
Contributions of graphs from the class (1, 3, 0) to A

Table VIII .
Contributions of graphs from the class (2, 2, 0) to A

Table X .
Contributions of graphs from the class (3, 1, 0) to A

Table XIV .
Contributions to A

Table XV .
Contributions toA (8)1 that must coincide with the values that are obtained by direct subtraction on the mass shell in the Feynman gauge

Table XV .
Contributions to Athat must coincide with the values that are obtained by direct subtraction on the mass shell in the Feynman gauge (continued)

Table XVI .
Contributions of the gauge invariant classes (k, m, m ) to A