Calculating the five-loop QED contribution to the electron anomalous magnetic moment: Graphs without lepton loops

This paper describes a computation of a part of the QED contribution to the electron anomalous magnetic moment that was performed by the author with the help of a supercomputer. The computed part includes all 5-loop QED Feynman graphs without lepton loops. The calculation has led to the result $A_1^{(10)}[\text{no lepton loops}]=6.793(90)$ that is slightly different than the value $7.668(159)$ presented by T. Aoyama, T. Kinoshita, and M. Nio in 2018. The discrepancy is about $4.8\sigma$. The computation gives the first independent check for that value. A shift in the fine-structure constant prediction is revealed in the paper. The developed calculation method is based on (a) a subtraction procedure for removing all ultraviolet and infrared divergences in Feynman parametric space before integration; (b) a nonadaptive Monte Carlo integration that uses the probability density functions that are constructed for each Feynman graph individually using its combinatorial structure. The method is described briefly in the paper (with the corresponding references to the previous papers). The values for the contributions of nine gauge-invariant classes splitting the whole set are presented in the paper. Moreover, the whole set of all 5-loop graphs without lepton loops is split into 807 subsets for comparison (in the future) of the calculated values with the values obtained by another methods. These detailed results are presented in the supplemental materials. Also, the supplemental materials contain the contribution values for each of 3213 individual Feynman graphs. An"oscillating"nature of these values is discussed. Technical details of the realization are described.

1 [no lepton loops] = 6.793(90) that is slightly different than the value 7.668(159) presented by T. Aoyama, T. Kinoshita, and M. Nio in 2018. The discrepancy is about 4.8σ . The computation gives the first independent check for that value. A shift in the fine-structure constant prediction is revealed in the paper. The developed calculation method is based on (a) a subtraction procedure for removing all ultraviolet and infrared divergences in Feynman parametric space before integration; (b) a nonadaptive Monte Carlo integration that uses the probability density functions that are constructed for each Feynman graph individually using its combinatorial structure. The method is described briefly in the paper (with the corresponding references to the previous papers). The values for the contributions of nine gauge-invariant classes splitting the whole set are presented in the paper. Moreover, the whole set of all 5-loop graphs without lepton loops is split into 807 subsets for comparison (in the future) of the calculated values with the values obtained by another methods. These detailed results are presented in the supplemental materials. Also, the supplemental materials contain the contribution values for each of 3213 individual Feynman graphs. An "oscillating" nature of these values is discussed. A realization of the numerical integration on the graphics accelerator NVidia Tesla V100 (as a part of the supercomputer "Govorun" from JINR, Dubna) is described with technical details such as pseudorandom generators, calculation speed, code sizes and structure, prevention of round-off errors and overflows, etc.
The full calculation of A ( A special place is occupied by the contribution of Feynman graphs without lepton loops to A . This set contains 3213 Feynman graphs 1 and forms a gauge-invariant class. This contribution is the most complicated one for both Monte Carlo integration and analytical calculations. For example, the uncertainty in (2) is entirely determined by that contribution. Also, it is the contribution that suffered the most from found mistakes and corrections; see Ref. [34]. The value can be obtained by using (2) and the value of the remaining part that can be extracted from Ref. [34]. By 2019, there was no independent calculations of A where the uncertainty corresponds to 1σ limits. It is in good agreement with the preliminary value 6.782(113) published in Ref. [35]. , n ≤ 4 , a e (hadronic) + a e (electroweak) (see a review in Ref. [33]) and the measured value of α from Ref. [36] based on a measurement of the cesium atom mass relative to the Planck constant we obtain a e [theory, α(Cs), Volkov] = 0.001159652181547 (6)(12)(229), where the first uncertainty comes from (4), the second one from the hadronic and electroweak corrections, and the last one from the uncertainty of α . The usage of (2) will give a e [theory, α(Cs), AKN] = 0.001159652181606 (11)(12)(229) instead. If we will use the a e prediction with (5) and the measured value (1) for improving α , we obtain Volkov] = 137.0359991427(7)(14)(331), where the uncertainties come from (4), the hadronic and electroweak corrections, (1), correspondingly. The discrepancy with (6)  The values (7) and (8) have the discrepancies 1.61σ and 1.69σ relative to (9). This means that the discrepancy between (4) and (3) affects α and a e slightly. However, this discrepancy can become significant in the future, when the precision of the measurements will be increased. Also, if both calculations have mistakes, then this can be sensible even at the current level of precision. Thus, an additional independent calculation is required.
There is no universal method that makes it possible to calculate 5-loop QED contributions in a realistic time frame. Firstly, the existing universal IR divergence control methods like those that are based on the dimensional regularization lead to enormous amounts of symbolic manipulations. And secondly, the universal integration routines demonstrate a very slow convergence on the obtained integrals.
To make the 5-loop calculations practically feasible it is required to remove all ultraviolet (UV) and infrared (IR) divergences before integration and to avoid any ε -like regularizations. All UV divergences in Feynman integrals can be removed by the direct subtraction on the mass shell using a forestlike formula like Zimmermann's forest formula 2 . However, an analogous method for removing IR divergences has not been invented yet. The anomalous magnetic moment is free from IR divergences: the IR divergences corresponding to soft virtual photons are compensated by the IR divergences connected with the on-shell renormalization; see notes in Ref. [42]. But unfortunately, direct methods lead to an emergence of IR divergences in individual Feynman graphs. Different authors use different homemade divergence subtraction procedures that work in some cases; see Refs. [6,8,43,33]. A relatively simple subtraction procedure giving finite Feynman parametric integrals was developed for our calculations. It was presented firstly in Ref. [44] and is briefly described in Sec. II.
The 5-loop calculations lead to Feynman parametric integrals with 13 variables. At this time, the only way to evaluate such integrals numerically is to use Monte Carlo integration. Unfortunately, Feynman parametric integrands after divergence subtraction are unbounded and have a very complicated asymptotic behavior near boundaries. The universal adaptive Monte Carlo integration routines like VEGAS can, in principle, work with unbounded functions and functions having a steep landscape. However, these routines are suited for functions with a certain shape. This becomes critical for large numbers of variables. For example, VEGAS uses the probability density functions of the form and tries to fit the functions f j to make the convergence as fast as possible 3 . Unfortunately, this approximation does not work fine for Feynman parametric integrals with large numbers of variables. A nonadaptive 4 method that uses some a priori knowledge about the Feynman parametric integrands behavior was developed for our calculations. The method that is briefly described in Sec. III works only for graphs without lepton loops. The first version of this method was presented in Ref. [42].
The developed Monte Carlo integration method allows us to reduce the needed number of samples substantially. However, in the 5-loop case, for evaluating 3213 Feynman graphs a supercomputer is still required. Modern graphics processors (GPUs) are more suitable for performing many uniform sequences of arithmetic operations in parallel than usual processors. The Monte Carlo integration was performed on GPUs NVidia Tesla V100 as a part of the supercomputer 5 "Govorun" from JINR (Dubna, Russia). The realization is described in Sec. IV with some programming details. Sec. V contains the results of the calculations, a discussion about these results, the description of the supplemental materials, and some technical information about the computation including the GPU performance, arithmetic precision statistics and so on.

II. DIVERGENCE ELIMINATION
The developed subtraction procedure is based on a forest formula with linear operators that are applied to the Feynman amplitudes of UV divergent subgraphs. This is similar to the Zimmermann forest formula. The difference is only in the choice of the linear operators used and in the way of combining them. Let us recapitulate the advantages of the developed procedure: • The procedure is fully automated for any order of the perturbation series 6 .
3 The Monte Carlo integration error usually behaves as σ ∼ C/ √ N , where N is the number of samples. However, it is very important to make C as small as possible. 4 except the inter-graph adaptivity described in Sec. IV.C and the adjustment of six constants (15) that was performed once for the 4-loop graphs 5 The GPU part of the supercomputer "Govorun" has 40 GPUs NVidia Tesla V100. The peak performance of the GPU part is 300 TFlops for double precision. The peak performance of the whole supercomputer (including the CPU part) is 500 TFlops. 6 The method must work for all Feynman graphs contributing to A (2n) 1 including the ones containing lepton loops; see Ref. [44]. However, a rigorous mathematical proof for this fact is not developed even for graphs without lepton loops.
• The method is beautiful and is relatively simple for realization on computers.
• The subtraction is equivalent to the on-shell renormalization: for obtaining the final result we should only sum up the contributions of all Feynman graphs after subtraction. Thus, no residual renormalizations are required.
• Feynman parameters can be used directly, without any additional tricks.
There are the following types of UV-divergent subgraphs 7 in QED Feynman graphs without lepton loops: electron self-energy subgraphs ( N e = 2, N γ = 0 ) and vertexlike subgraphs ( N e = 2, N γ = 1 ), where by N e and N γ we denote the number of external electron and photon lines in the subgraph.
Two subgraphs are said to overlap if they are not contained one inside the other, and the intersection of their sets of lines is not empty.
A set of subgraphs of a graph is called a forest if any two elements of this set do not overlap.
For a vertexlike graph G by F[G] we denote the set of all forests F that consist of UV-divergent subgraphs of G and satisfy the condition G ∈ F . By I[G] we denote the set of all vertexlike subgraphs G ′ of G such that G ′ contains the vertex that is incident 8 to the external photon line of G . 9 We work in the system of units, in which = c = 1 , the factors of 4π appear in the fine-structure constant: α = e 2 /(4π) , the tensor g µν is defined by the Dirac gamma-matrices satisfy the condition γ µ γ ν + γ ν γ µ = 2g µν .
The following linear operators are used for the subtraction:

1.
A is the projector of the AMM. This operator is applied to the Feynman amplitudes of vertexlike subgraphs. See the definition in Refs. [44,42]. 7 We consider only such subgraphs that are strongly connected and contain all lines that join the vertexes of the given subgraph. 8 We say that a line l and a vertex v are incident if v is one of the endpoints of l . 9 In particular, G ∈ I[G] .
2. The definition of the operator U depends on the type of UV-divergent subgraph to which the operator is applied: • If Σ(p) is the Feynman amplitude that corresponds to an electron self-energy subgraph, then, by definition 10 , where m is the mass of the electron,p = p µ γ µ .
• If Γ µ (p, q) is the Feynman amplitude corresponding to a vertexlike subgraph, then, by definition, 3. L is the operator that is used in the standard subtractive on-shell renormalization of vertexlike subgraphs. If Γ µ (p, q) is the Feynman amplitude that corresponds to a vertexlike subgraph, (10) is satisfied, then, by definition, Let f G be the unrenormalized Feynman amplitude that corresponds to a vertexlike graph G . Let us write the symbolic definitioñ In this notation, the subscript of an operator symbol denotes the subgraph to which this operator is applied.
The coefficient before γ µ inf G is the contribution of G to a e . For example, for the graph G from FIG. 3 we will have the following operator expression: (11) Here the subscripts mean the subgraphs to which the operators are applied (denoted by the enumeration of the vertexes). The expression means that we should remove brackets, and for each term we should transform the Feynman amplitudes of the subgraphs using the corresponding operators from the inner subgraphs to the outer ones. The transformation is applied in Feynman parametric space before integration. This can be explained easy using the approach to Feynman parameters based on the transferring from Schwinger parameters; see Ref. [44].
The operators U are designed for removing UV divergences in the way similar to the Zimmermann forest formula and Bogoliubov's R-operation. In contrast to the usual for QED operator L the operators U do not generate additional IR divergences. The multiplier in the square brackets in (11) corresponds to elimination of the IR divergences that correspond to soft virtual photons on the external electron lines and the UV divergences connected with the subgraphs to which the operators are applied. Also, the "overall" UV and IR divergences are removed by the magnetic moment projector A as well as it works in the 1-loop case; see [43] and [44]. It is important that the operator U applied to self-energy subgraphs extracts the self-mass part completely. This allows us to avoid IR divergences of power type; see Discussion in Ref. [44]. The cancellation of divergences is described in detail 11 in terms of Feynman parameters in Ref. [44]; see also additional comments in Ref. [32].
The equivalence of the subtraction procedure and the direct subtraction on the mass shell is proved in a combinatorial way in Ref. [44], Appendix B. For proving this equivalence we use the fact that the operator U preserves the Ward identity; see Ref. [44]. It is easy to see this equivalence in the 2-loop case; see Section 3 of Ref. [44]. Let us note that we do not use the operator of QED on-shell renormalization of electron self-energy subgraphs; the Ward identity helps us in this case too. For a detailed explanation of the developed method, see Ref. [44] and some additional explanations in Refs. [42,32]. 11 although not completely rigorously

III. MONTE CARLO INTEGRATION A. Probability density functions
After removing divergences the contribution of each Feynman graph to A (2n) 1 is represented as an integral of the form where M = 3n − 1 (see 12 ), z j are the Feynman parameters. For each graph we calculate the (3n − 2) -dimensional integral directly; we do not use any additional reductions.
We propose to split all the integration area into the Hepp sectors (see Ref. [45]) that are simply orders on the Feynman parameters: We use the probability density functions of the form where z = (z 1 , . . . , z M ) , C 1 , C 2 , C 3 , C 4 are some constants (see Sec. IV), Deg(s) are positive real numbers for each set s of internal lines 13 of the graph (except the empty and full sets), C is the normalization constant defined by The stabilization functions g 2 , g 3 , g 4 are defined in Ref. [32]; an additional constant D is used for defining g 3 . Functions of the form (14) was first used for approximating the behavior of parametric integrals by E. Speer; see Ref. [46].
The main problem in this approach is that for good Monte Carlo convergence the values Deg must be adjusted very accurately. Speer's lemma (Ref. [46]) states that in some simple cases, when we do not have UV divergent subgraphs and we do not consider the infrared behavior, we may take the ultraviolet degree of divergence (with the sign minus) of s as Deg(s) and use (14) as an upper bound for |I(z)| . A good upper bound can play the role of a good probability density function for Monte Carlo integration; see Ref. [42]. However, in the real case we should use a more complicated formulas for obtaining Deg(s) . These formulas were developed for our calculations 14 . The first version of the method was presented in Ref. [42]. We use an improved version from Ref. [32]. The algorithm of obtaining Deg(s) uses six constants C bigF > 0 , C bigZ > 0 , C add , C subI , C subSE , C subO that should be choosed by hand. For the 5-loop case we use the same values as we used for the 4-loop, 3-loop, and 2-loop cases in Ref. [32]: These values were obtained by numerical experiments with 4-loop graphs. Note that some of the values Deg(s) , obtained by the method, less than 1 and even sometimes less than 1/3 , in contrast to integer numbers in Speer's lemma (Ref. [46]). The terms C j g j (z) , j = 2, 3, 4 in (13) are added for ensurance: they cannot slow down the Monte Carlo convergence speed significantly, but they can (in principle) prevent from occasional emergence of gigantic contributions of some samples; see Ref. [32].
The algorithm of fast random sample generation is described in Ref. [42].

B. Obtaining the value and uncertainty
If the random samples z 1 , . . . , z N are generated with the probability density function g(z) , then the integral value is approximated as For approximating the standard deviation σ we can use the formula where y j = I(z j )/g(z j ) . However, in practice this formula often leads to an underestimation of the standard deviation. The reason is that the real σ 2 is the mean value of the right part of (17), but using (17) we will rather obtain something near the median of that value that is often less than the mean value. Taking into account this difference is especially important when we integrate unbounded functions. Because of this, we use an improved value σ ↑ as σ instead of (17). The algorithm of obtaining σ ↑ based on heuristic predictions is described in Ref. [32]. For the 5-loop case we use exactly the same method. The value defined by (17) we denote by σ ↓ . A large value of σ ↑ /σ ↓ indicates that the obtained integral value is suspicious, but no guarantees are possible for Monte Carlo integration. We use σ ↑ for all intervals in the paper.

A. Evaluation of the integrands with GPUs
The code for all 3213 integrands was generated automatically. The D programming language was used for the codegenerator; see Ref. [47]. The generated code was written in C++ 15 with CUDA; see Ref. [48]. The codegeneration took about one month on two CPU cores of a personal computer. Numerical subtraction of divergences under the integral sign can cause round-off errors. We use interval arithmetic (IA) for controlling them. In interval arithmetic we work not with numbers, but with intervals of numbers. NVidia GPUs support all necessary operations for the realization of interval arithmetic. However, arithmetic operations with intervals are slow, and we developed a fast modification of interval arithmetic that was called "eliminated interval arithmetic" (EIA). The main idea of EIA is that in some cases we can replace a large sequence of interval arithmetic operations by the analogous sequence of operations on the centers of the intervals and estimate the radius of the final interval by a relatively simple formula. The intervals obtained by EIA are wider than the ones obtained by IA, but both of them are reliable. EIA is described in detail in Ref. [32].
The integrals for all Feynman graphs are calculated simultaneously; see Sec. IV.C. At the stage of inititialization, we evaluate approximately 10 8 random points for each Feynman graph with the machine double-precision IA taking the nearest to zero point of each interval. After initialization, when we evaluate the value of I(z)/g(z) from (16) at some point z , we first calculate it using EIA. The obtained interval [y − ; y + ] is accepted if 16 where the summations go over all contributing Feynman graphs, j is the number of the current graph, σ ↓,l is the value of σ ↓ calculated for the integral corresponding to the graph with the number l . This formula guarantees that the total round-off error (summed over all graphs) does not exceed Cσ ↓ for some constant C . Also, it satisfies the natural demand that larger round-off errors are possible for graphs with larger σ ↓,l . If the interval was not accepted, it is recalculated using IA with increased precisions until it is accepted: machine double precision, 128-bit-mantissa precision, 192-bitmantissa precision, 256-bit-mantissa precision. If all precisions failed, then the contribution is supposed to be zero. EIA fails approximately on one in five samples. However, the integrand evaluation in EIA is approximately 6.5 times faster than in the double-precision IA; see Sec. V and Table III. Thus, the usage of EIA significantly improves the performance. The Monte Carlo samples are generated and performed by blocks. Each block contains approximately 10 9 samples pertaining to a single Feynman graph. The block scheduling algorithm is described in Sec. IV.C. The samples are processed on a GPU in 20480 parallel threads 17 . Each thread processes some set of the block samples sequentially. Branching is not allowed in the execution of a code for GPU, so the samples requiring increased precision are collected and then processed in the subsequent GPU calls.
We use a handmade library for arbitrary precision arithmetic. The 128bit-mantissa arithmetic is realized using the GPU register memory 18 . The greater precisions are realized with the global GPU memory. The usage of the register memory improves the performance by approximately 10 times 19 . Nevertheless, the increased precision calculations occupy a considerable part of the calculation time; see Sec. V and Table III. For each integrand we generate program codes for three precisions separately: EIA, double-precision IA, and arbitrary-precision IA. This leads to a 16 This criteria differs from the previous one from Ref. [32]. The previous criteria was erroneous: it did not take into account that the mean value of the round-off error is not zero. However, that error did not significantly affect the result. 17 80 blocks of 256 threads; see [48]. 18 The register memory is the fastest kind of memory in NVidia GPUs. 19 However, Table III shows a more significant gap. That is because there are very few points that require 192-bit-mantissa and more precision, and the GPU parallelism can not be exploited for all its worth on these points. relatively large code. The total size of the integrands code is 400 GB in the not compiled form and 500 GB in the compiled form.
The calculation of some integrand values requires millions of arithmetic operations. However, both compilers and optimizers do not like big functions. We split the calculation of each integrand into several CUDA kernels 20 . Each CUDA kernel contains approximately 3000 arithmetic operations for the EIA code, 2000 operations for the double-precision IA code, and 1000 operations for the arbitrary-precision IA code. The arbitrary-precision integrand code is also split into several files: approximately 50 CUDA kernels per file. The choice of the function sizes is a compromise: the performance of small functions suffers from memory transfer delays, but a big function size leads to a badly optimized 21 and slowly compiled code.
We use the techniques for prevention of occasional emergence of very large values that are described in [42] (with little modifications and adaptation for GPU parallelism).
When we calculate I(z)/g(z) , it is often the case that machine double precision is not enough for storing g(z) . The machine double precision allows values up to 2 1025 . This situation is due to a large number of variables and a closeness of some values of Deg(s) from (14) to zero. It is not obvious from the beginning that these points can be ignored; see Sec. V and Table III. To solve this problem, we store g(z) as x · 2 j , where 0.5 ≤ x < 1 is stored with machine double precision, j is stored as 32-bit integer.

B. Compilation of the integrands code
The integrands code was compiled with the NVidia Compiler nvcc into shared libraries that are linked dynamically with the integrator. The compiler is a relatively slow one, and 400 GB of code requires a lot of time for compilation. Like the integration, this compilation was performed on the supercomputer "Govorun" from JINR (Dubna, Russia). The processors Intel Xeon Gold 6154 with 18 cores were mostly used for this work. The compilation operation was organized using the MPI protocol with parallel processes that run nvcc: two processes per CPU core. The total compilation time amounted to about 120 CPU-hours.

C. Monte Carlo integration: details
The Monte Carlo integrator was written in C++ with CUDA. The integration was performed on several GPUs NVidia Tesla V100 of the supercomputer "Govorun" from JINR (Dubna, Russia). Most of the time from 2 to 16 GPUs were occupied for the integration. The inter-device parallelism was organized using the MPI protocol.
The controlling part of the integrator generates the numbers of Feynman graphs to obtain a next block of samples. The number j of a Feynman graph is generated randomly. The probabilities p j of taking the graph j are chosen to make the convergence as fast as possible. Let us describe the method of obtaining p j . Put where N j is the number of samples that have already been processed for the graph j . By t j we denote the average time required for evaluation of one integrand value for the graph j . The total time that is needed for evaluation of N samples is approximately The total standard deviation can be estimated as where q j = p j t j . The minimum point satisfies the equation for any i, l . Using this, we obtain where C is some constant, or We use this probabilities for random generation of the graph numbers with a little modification for stabilization: a little more attention is being given to the graphs j with big σ ↑,j /σ ↓,j . After integration, the total standard deviations (upper and lower) are obtained by

V. RESULTS AND THE TECHNICAL IN-FORMATION
For reliability, two calculations were performed with different pseudorandom generators, with different choices of the constants C 2 , C 3 , C 4 from (13) and the constant D that is used for defining g 3 from (13); see Ref. [32].
• Calc 1: the generator MRG32k3a from the NVidia CURAND library, We use the value for all calculations. The calculations have led to the results 1 [no lepton loops, Calc 2] = 6.84 (12). The results were first statistically combined graph-by-graph and then were summed using (19). These operations are not commutative. Thus, some of the results may look strange 22 .
The supplemental materials contain the results for all 3213 Feynman graphs for both calculations. Table I contains the results for nine gauge-invariant classes (k, m, m ′ ) splitting the set of all 5-loop Feynman graphs without lepton loops. By definition, (k, m, m ′ ) is the set of all Feynman graphs such that m and m ′ are the quantities of internal photon lines to the left and to the right from the external photon line (or vice versa), k is the quantity of photons with the ends on the opposite sides of it. In this table, N diag and N total are the number of Feynman graphs and the total number of Monte Carlo samples generated for this class. Table I. Contributions of the gauge invariant classes (k, m, m ′ ) to A (10) 1 ; here, a i = I i (z)dz is the contribution of the i-th graph to the value, I i is the corresponding Feynman parametric integrand.

Class
Calc 1 It was observed by different researchers that the contributions of gaugeinvariant classes are relatively small in absolute value, but the contributions of individual Feynman graphs are relatively large and often significantly greater than the class contributions. This occurs regardless of the divergence elimination method used. Table I  It is very important to check the obtained values independently. However, the amount of computations is huge is this case. Thus, an ability to check the values by parts using different methods would be very useful. We have a splitting of the whole set of graphs into 807 subsets for which the developed subtraction procedure is equivalent to the direct subtraction on the mass shell in Feynman gauge. For each set the equivalence can be proved combinatorially using the Ward identity for individual graphs; see Ref. [32]. The splitting is presented in the supplemental materials. It was generated automatically. Each set in this splitting is contained in some gauge-invariant class (k, m, m ′ ) . There are many sets containing only one graph. The largest (1, 4,   The graph sets from the splitting smooth the peaks of the individual graph contributions as well as the gauge-invariant sets 23 . However, this "smoothing" is not so prominent: some of the set contributions are many times greater than the total contribution (in absolute value). The set with the maximum contribution (in absolute value) is depicted in FIG. 2. This contribution equals 42.0700(50) .  • σ ↑ /σ ↓ is the relation between the improved standard deviation and the conventional one, see Sec. III.B and Ref. [32]; • N total is the total quantity of Monte Carlo samples; • N fail EIA is the quantity of samples for which eliminated interval arithmetic failed; see Sec. IV.A and Ref. [32]; • △ fail EIA is the contribution of that samples; 23 It should be noted that this smoothing is not a general principle: for example, the sum of n independent random numbers with the mean values 0 and the quadratic means a have the quadratic mean a · √ n .
• N fail IA is the quantity of samples for which direct double-precision interval arithmetic failed; • △ fail IA is the contribution of that samples; • N fail 128 , N fail 192 , N fail 256 are the quantities of samples for which the interval arithmetic based on numbers with 128-bit, 192-bit, 256-bit mantissa failed; • △ fail 128 , △ fail 192 are the contributions of that samples; • N dens out of double is the quantity of samples for which machine double precision was not enough for storing the probability density; see Sec. IV.A; • △ dens out of double is the contribution of that samples; • GFlops = billions floating point number operations per second (during the evaluation of the integrands); GIntervals = billions interval operations per second (in the sense of interval arithmetic); M = millions.  It is easy to see that in EIA one arithmetic operation on intervals takes approximately one operation on numbers. This is due to the fact that the most part of the EIA calculation is occupied by the operations on the centers of the intervals. However, in IA one interval operation takes approximately five operations on numbers. Also, the speed of the number operations for IA is by 1.6 times less than for EIA. This is because most of the operations in IA require specifying a rounding mode 24 , but the operations on the centers of intervals in EIA do not require it.
Calc 1 suffered from some errors that cause an emergence of anomalous points that have contributions to N fail 192 , N fail 256 , N dens out of double ; see Table III. We can not perform the full recalculation because this requires a lot of time. However, that points do not have a significant impact on the results; the table confirms this fact. That errors were corrected in Calc 2.
Table III demonstrates that the points requiring an increased precision have a significant contribution to the result. For example, △ fail EIA and △ fail IA are at the level of the total contribution, △ fail 128 is at the level of the uncertainty. Also, the table shows that that contributions are unstable due to an "oscillating" character of the individual graph contributions, a floating character of the interval acception criteria (18), and a difference in the probability density functions. In addition, the table shows that the contribution △ dens out of double is insignificant. However, this contribution is too far from the boundaries of machine double precision like 2 −1025 (on a logarithmic scale). Thus, there may be situations, where such contributions will be significant. This fact demonstrates that universal Monte Carlo integration routines can work poorly for many-loop Feynman parametric integrals.
The Monte Carlo integration convergence quality for a given graph j can be estimated as where N j is the number of Monte Carlo samples for the j -th graph, I j is the corresponding Feynman parametric integrand. Less values correspond to a better quality. The graphs with the best and the worst quality are shown in FIG. 4 (a,b). The corresponding values (for Calc 2) are 16.2, 525.9.
These values demonstrate that even in the best case the Monte Carlo integration works not ideally due to large dimensionality. However, this is acceptable and requires a relatively small amount of the supercomputer time for integration.

VI. CONCLUSION
A numerical calculation of the total contribution of the 5-loop QED Feynman graphs without lepton loops to the corresponding coefficient of the electron anomalous magnetic moment expansion in α was performed. The calculation is based on a specific method of reduction of the problem to Feynman parametric integrals and on Monte Carlo integration using a supercomputer. Usage of some mathematical considerations about the integrands behavior provided us an ability to reduce the amount of the needed supercomputer power and time significantly. This calculation provides the first independent check of the value obtained by T. Kinoshita's team that is presented in Ref. [33]. However, the discrepancy of about 4.8σ between the results was discovered. On the one hand, this discrepancy does not significantly affect the known values of a e and α . But on the other hand, it requires an additional independent calculation and can affect the physics in the future.
The results of the calculation are presented in detail. This detailed presentation gives us an ability to check the results by parts using another methods. The contribution values of nine gauge-invariant classes splitting the whole set are presented for the first time (except the preliminary values in Ref. [35]).
For reliability, two different Monte Carlo integrations with different pseudorandom generators were performed. The results of these calculations agree with each other, and they were stastistically combined in the final result.
A cancellation of an "oscillating" nature of the individual Feynman graph contributions in the gauge-invariant classes confirms that the results are correct. This "oscillating" nature is described in detail. However, there is no mathematical foundation for this cancellation at the current moment of time. Also, it is surprising that we have only an inter-graph oscillation, but not in Feynman parametric space for one graph.
The technical information that is presented in the paper will be useful for the scientists that are going to perform many-loop calculations in quantum field theory or another computations using supercomputers and graphics accelerators. Also, the provided information about the Monte Carlo integration will be useful for developers of Monte Carlo integrators.
In closing, let us recapitulate some problems that still remain open: 1. To perform an independent calculation of the 5-loop contribution of the graphs with lepton loops; to check the value from Ref. [34].