Precision Global Determination of the $B\to X_s\gamma$ Decay Rate

We perform the first global fit to inclusive $B\to X_s\gamma$ measurements using a model-independent treatment of the nonperturbative $b$-quark distribution function, with next-to-next-to-leading logarithmic resummation and $\mathcal{O}(\alpha_s^2)$ fixed-order contributions. The normalization of the $B\to X_s\gamma$ decay rate, given by $\lvert C_7^{\rm incl} V_{tb} V_{ts}^*\rvert^2$, is sensitive to physics beyond the Standard Model (SM). We determine $\lvert C_7^{\rm incl} V_{tb} V_{ts}^* \rvert = (14.77 \pm 0.51_{\rm fit} \pm 0.59_{\rm theory} \pm 0.08_{\rm param})\times 10^{-3}$, in good agreement with the SM prediction, and the $b$-quark mass $m_b^{1S} = (4.750 \pm 0.027_{\rm fit} \pm 0.033_{\rm theory} \pm 0.003_{\rm param})\,\mathrm{GeV}$. Our results suggest that the uncertainties in the extracted $B\to X_s\gamma$ rate have been underestimated by up to a factor of two, leaving more room for beyond-SM contributions.


Introduction
The flavor-changing neutral-current b → sγ transition is well known for its high sensitivity to contributions beyond the Standard Model (SM). The main goal of our global analysis of the B → X s γ decay rate is to obtain a precise constraint on the shortdistance physics it probes, which can then be compared to predictions in the SM [1][2][3][4] or beyond [5][6][7]. In our approach, this amounts to extracting a precise value of the Wilson coefficient |C incl 7 | from the measurements. Since b → sγ is a two-body decay at tree level, the photon energy spectrum, dΓ/dE γ , peaks only a few hundred MeV below the kinematic limit E γ < ∼ m B /2. In this peak region, the measurements are most precise, but the theory predictions depend on a nonperturbative function, F(k), often called the shape function, which encodes the distribution of the residual momentum k of the b-quark in a B meson [8,9]. A key aspect of our analysis is a modelindependent treatment of F(k) based on expanding it in a suitable basis [10]. This approach can incorporate any given shape function model, by using it as the generating function for the basis expansion, and thus goes beyond existing approaches that use specific models [11][12][13][14][15].
While F(k) primarily affects the shape of the decay spectrum, its normalization is determined by |C incl 7 | 2 , up to small corrections. Thus, with our treatment of F(k), we can perform a global fit to the measurements of dΓ/dE γ , including the precisely measured peak region, to simultaneously determine F(k) and a precise value of |C incl 7 |. Our global fit is the first to exploit the full available experimental information on the spectrum [16][17][18][19], together with the most precise theoretical knowledge of its perturbative contributions. This provides a more robust approach than the current method of using theoretical predictions for the B → X s γ rate with a fixed cut at E γ > 1.6 GeV [4] and corresponding extrapolated measurements [20].
The B → X s γ Spectrum Using SCET [21][22][23][24], we can write the photon energy spectrum in a factorized form, where and m b denotes a short-distance b-quark mass, for which we use the 1S scheme [25][26][27]. The first term in Eq. (1) is the dominant contribution, where F(k) contains the leading nonperturbative shape function plus a combination of subleading shape functions specific for B → X s γ. The function P (k) encodes the perturbatively calculable b → sγ spectrum, with k ∼ m b − 2E γ . It receives contributions from different operators in the effective electroweak Hamiltonian, Here, W s 77 (k) contains the universal "singular" contributions proportional to α i s ln j (k/m b )/k and α i s δ(k), which dominate in the peak region where k is small [28]. It is included following Ref. [10] to NNLL order, which includes next-to-next-to-leading-logarithmic (NNLL) resummation and all singular terms at O(α 2 s ) [22,[29][30][31][32][33][34][35][36][37]] The coefficient C incl 7 is dominated by the Wilson coefficient C 7 (µ) in the electroweak Hamiltonian, The s i terms are defined to cancel the µ dependence of C 7 (µ) and to satisfy s i ( m b , m b ) = 0. The C i r i terms contain all virtual corrections proportional to C i =7 that give rise to singular contributions. In particular, they contain the sizable corrections from virtual cc loops, and the resulting sensitivity to the charm quark mass, m c , which are a dominant theory uncertainty in the decay rate. Since in our approach these contributions are included in C incl 7 , they only affect its SM prediction, but not its determination from the experimental data. The results of Refs. [3,4,38,39] yield the NNLO SM prediction [28], C incl 7 SM = 0.3624 ± 0.0128 cc ± 0.0080 scale .
The remaining W ns ij (k) terms in Eq. (3) are "nonsingular" contributions with C i = C i ( m b ) [28]. They start at O(α s ) and are suppressed by at least k/m b relative to W s 77 (k), and are therefore subleading in the peak region. They are included to full O(α 2 s ) for ij = 77, 78 [40][41][42], while the remaining ones are known and included to O(α 2 s β 0 ) [43][44][45]. Since W s 77 (k) dominates in the peak region, the normalization of the spectrum is determined by |C incl 7 |, enabling its precise extraction. The second term in Eq. (1) is subdominant, and describes so-called resolved and unresolved contributions, where P a are perturbative coefficients starting at O(α s ), and the g a are additional subleading shape functions [46]. The uncertainties from resolved contributions are much smaller than suggested by earlier estimates [47], and are not relevant at the current level of accuracy [28] (see also Ref. [48]). The only marginally relevant contribution is related to the known O(1/ m 2 c ) correction to the total rate [49][50][51], and is included in our analysis via a subleading O(Λ 2 QCD ) shape function g 27 (k). The nonperturbative shape function F(k) is dominated by the leading-order shape function, so we assume it is positive. We introduce a dimension-1 parameter λ, and expand F(k) as [10], where f n (x) are a suitably chosen complete set of orthonormal functions on [0, ∞). The normalization condition In practice, the expansion for F(k) must be truncated at a finite order N . Therefore, the form of F(k) used for the fit is given by the following approximation where The effect of the truncation in Eq. (8) is approximated by the modified coefficients c n , which differ from thec n in Eq. (8). In particular, we always keep the normalization of F(k) exact by enforcing N n=0 c 2 n = 1 .
Using the expansion for F(k) in Eq. (8) we get Here, N s = |C incl 7 V tb V * ts | 2 m 2 b , and Eq. (11) defines dΓ 77,mn /dE γ , which we precompute from Eq. (3). The ellipses denote subleading terms not proportional to |C incl 7 | 2 , which are also written in terms of N s and c n as explained in [28]. Then, N s and the c n are fitted from the measured spectra, with the uncertainties and correlations in the measurements captured in the uncertainties and correlations of the fit parameters. Using the moment relations for F(k) [28], we obtain C incl 7 and m b , as well as the heavy-quark parameters λ 1 and ρ 1 from the fitted N s and c n . The other coefficients C i =7 are fixed to their SM values [28]. Of these, only C 1 and C 2 are numerically relevant, which are known to be SM dominated, while C 8 , which is sensitive to new physics, gives only a small contribution. We use input values for λ 2 and ρ 2 , which are obtained from the B and D meson mass splittings [28].
Fit procedure We implement a binned χ 2 fit, with Here Γ meas i is the measured B → X s γ rate in bin i, Γ i is the integral of Eq. (11) over bin i, V is the full experimental covariance matrix, and the sum runs over all bins of all measurements included in the fit.
The orthonormal basis {f n } is constructed [10] such that the first F 00 (k) term in the expansion of F(k) can have any (nonnegative) functional form, while the higher F mn (k) terms provide a complete expansion generated from it. If F 00 (k) provides a good approximation to F(k), the expansion converges very quickly due to the constraint in Eq. (7), and consequently a good fit can be obtained with small N , making the best use of the data to constrain F(k). Hence, F 00 (k) should already provide a reasonable description of the data. To find such F 00 (k), we perform a pre-fit to the data using three different functional forms for F 00 (k), given in [28], over a wide range of λ. We choose the form that provides the best pre-fits. Its χ 2 probability is shown in Fig. 1 for sufficiently different values of λ such that each can be considered as a different basis. We choose the best λ = 0.55 GeV (orange) as our default basis, and use λ = 0.525, 0.575, 0.6 GeV (green, blue, yellow), which also have good pre-fits, as alternative bases to test the basis independence.
The truncation in Eq. (8) induces a residual dependence on the functional form of the basis. To ensure that the corresponding uncertainty is small compared to others, the truncation order N is chosen based on the available data, by increasing N until there is no significant improvement in fit quality. This is done by constructing nested hypothesis tests using the difference in χ 2 between fits of increasing number of coefficients. If the χ 2 improves by more than 1 from the inclusion of an additional coefficient, the higher number of coefficients is retained. To account for the truncation uncertainty, we include one additional coefficient in the fit. It is in this sense that our analysis is model independent within the quoted uncertainties. The final truncation order is found to be N = 3 for each considered basis. To ensure that the entire fit procedure including the choice of the basis and truncation order is unbiased, it is validated using pseudo-experiments generated around the best fit values, using the full experimental covariance matrices.
Results We include four differential B → X s γ measurements [16][17][18][19] in the fit. The measurements in Ref. [16][17][18] include B → X d γ contributions, which are subtracted assuming identical shapes for B → X s γ and B → X d γ and that the ratio of branching ratios is |V td /V ts | 2 = 0.0470 [52]. For Ref. [19], we combine the highest six E γ bins to stay insensitive to possible quark- hadron duality violation and resonances with masses near m K * . We use the measurements of Refs. [17,18] in the Υ(4S) rest frame and boost the predictions accordingly. We use the uncorrected measurement from Ref. [17] and apply the experimental resolution matrix [53] to the predictions.
The fit results for N s and c 0−3 including their correlations are given in [28]. The resulting shape function is shown in Fig. 2, and the results for |C incl 7 | and m b ≡ m 1S b are shown in Fig. 3. We also determine the kinetic energy parameter λ 1 in the invisible scheme [10], with plots analogous to Fig. 3 given in Fig. S2 in [28]. We find the following results: |C incl 7 V tb V * ts | = (14.77 ± 0.51 fit ± 0.59 theory ± 0.08 param ) × 10 −3 , m 1S b = (4.750 ± 0.027 fit ± 0.033 theory ± 0.003 param ) GeV , The first uncertainty with subscript "fit" is evaluated from the ∆χ 2 = 1 variation around the best fit point. It incorporates the experimental uncertainties as well as the uncertainty due to the unknown shape function, which is simultaneously constrained in the fit. The theory and parametric uncertainties are evaluated by repeating the fit with different theory inputs [28]. The theory uncertainties are due to unknown higher-order perturbative corrections to the shape of the spectrum in the peak region, which are evaluated by a large set of resummation profile scale variations. The results for all variations are shown by the yellow lines in Fig. 2 and scatter points in in Eq. (13) is obtained from the largest absolute deviation for a given quantity (ignoring the apparent asymmetry in the variations). The parametric uncertainty is only relevant for λ 1 , for which it comes entirely from ρ 2 .
Varying the residual cc-loop contributions in the theory inputs for the fit, equivalent to the cc uncertainty in Eq. (5), changes the extracted |C incl 7 | by ±0.2% and m 1S b by ±1 MeV, showing that by far the dominant dependence on and uncertainty from these contributions is factorized into C incl 7 . The uncertainty due to the numerical value of m 2 c / m 2 b contributes most of the parametric uncertainty of |C incl 7 | in Eq. (13). From Eq. (5) and |V tb V * ts | = (41.29 ± 0.74) × 10 −3 [52], we find the SM value |C incl 7 V tb V * ts | = (14.96±0.68)×10 −3 , with the uncertainty dominated by |C incl 7 | in Eq. (5). This is shown by the gray band in Fig. 3, and is in excellent agreement with our extracted value.
Converting our result for m 1S b to the MS scheme at three loops including charm-mass effects [54], we find where the first uncertainty comes from the total uncertainty in m 1S b in Eq. (13), and the second one is the con- coefficient (c 3 ) to account for the truncation uncertainty.
Conclusions We presented the first global analysis of inclusive B → X s γ measurements to determine |C incl 7 | within a framework that allows a model-independent and data-driven treatment of the nonperturbative b-quark distribution function F(k). The value extracted from Eq. (13), |C incl 7 | = 0.3578 ± 0.0199, is consistent with the SM prediction in Eq. (5).
In comparison, in the past, the SM prediction for the rate in the E γ > 1.6 GeV region, B(B → X s γ) = (3.36 ± 0.23) × 10 −4 [4], was compared with its measurement, B(B → X s γ) = (3.32 ± 0.15) × 10 −4 [20], which have 6.8% and 4.5% uncertainties, respectively. The latter relies on an extrapolation to the 1.6 GeV cut and on corresponding uncertainty estimates, which entail insufficient variations of the nonperturbative shape-function models and perturbative uncertainties that affect the spectrum. In addition, correlations in these uncertainties in calculating and measuring the rate for E γ > 1.6 GeV cannot be fully assessed. In contrast, in our approach, C incl 7 is reliably calculable in the SM or in models beyond it, and the relevant hadronic physics and its uncertainties are determined from the data, together with the extraction of |C incl 7 |. Hence, our approach is more reliable, as it makes optimal use of the data, uncertainties from nonperturbative parameters and perturbative inputs are clearly traceable, and no double counting can occur.
The uncertainty in our extracted |C incl 7 V tb V * ts | 2 from Eq. (13) is 10.6%, about twice the uncertainty in HFLAV's result for the E γ > 1.6 GeV rate. If we neglect the theory uncertainties as well as the truncation uncer-tainty (by repeating the fit only including up to c 2 ), we would obtain a smaller uncertainty of 5.5%, close to that of HFLAV's result. This suggests that HFLAV's uncertainty is underestimated by about a factor of two, which leaves more room for new physics. More importantly, the precision of testing the SM is currently limited by the extraction of |C incl 7 | from data, and can be improved significantly with high-precision Belle II measurements.
BaBar sum over exclusive modes Phys. Rev. D86, 052012 (2012) FIG. S1. Fit results to the measured photon energy spectra [16][17][18][19]. The orange lines are the fitted central values, and the yellow bands correspond to the ∆χ 2 = 1 variation. We omit the first 6 bins of the Belle inclusive spectrum and the first 3 bins of the BABAR inclusive spectrum, as these have very large uncertainties and provide no additional information.

A. Additional fit results
The full expression of Eq. (11) used in the fit including the non-77 terms is given by where the normalization N s is defined by   Table S1 for λ = 0.55 GeV and N = 3 by inverting the moment relations for F(k). Only the fit uncertainties are included. The dΓ ij,mn and dΓ g27,n are precomputed from the basis expansion of the shape function. The c n and the normalization N s are determined from the fit. For the normalization prefactors of the remaining non-77 nonsingular terms in Eq. (S1) we use the SM input values collected in Sec. E. The overall minus sign in N 27 and N 78 arises from assuming the SM negative sign for Re(C incl 7 ) = −|C incl 7 |, and assuming the SM imaginary part of C incl 7 , which is negligible. The value for m b in the prefactors is obtained during the fit from the c n as discussed in Sec. D 2. The coefficients d n parametrize the g 27 subleading shape function that cannot be absorbed into the leading shape function, cf. Sec. D 3.
The fitted experimental spectra with the fit results overlayed are shown in Fig. S1. The central value is shown by the orange line, and the yellow band corresponds to the ∆χ 2 = 1 uncertainties of the fit. The fit results for N s and c n and their correlation matrix are given in Table S1. The final results for |C incl 7 |, m b ≡ m 1S b , λ 1 , and ρ 1 together with their correlation matrix are given in Table S2. They are obtained from the fitted N s and c n by using Eq. (S2) and the moment relations discussed in Sec. D 2. In Fig. S2 these results and the corresponding theory uncertainties are shown as well, analogous to Fig. 3 in the main text.
In Fig. S3 basis coefficients is shown. As discussed in the main text, the truncation order is determined using a nested hypothesis test to determine the appropriate number of coefficients given the available data sets. For the nominal fit we use 4 coefficients (c 0,1,2,3 ). Note that all fits with fewer coefficients also have acceptable χ 2 , so the fit quality alone is not a sufficient criterion for choosing the number of coefficients. On the other hand, the results with only c 0 and c 0,1 , which effectively correspond to using a fixed model for The effective Hamiltonian for B → X s γ is The dominant contributions are from where P R,L = (1 ± γ 5 )/2, and we neglected the mass of the strange quark (giving m 2 s /m 2 b suppressed corrections). The The Wilson coefficient C incl 7 arises when we carry out a split matching procedure to separate the scale dependence above and below the scale µ 0 ∼ m b [55,56]. Above µ 0 , we have the matching onto H eff at the weak scale µ weak ∼ m W and its renormalization group evolution down to µ 0 ∼ m b . At µ 0 , we have virtual matrix element corrections from all operators O i =7 that are proportional to the tree-level matrix element of the chromomagnetic operator O 7 . Together, these effects can be combined to define the effective Wilson coefficient which is the main short-distance perturbative coefficient that is constrained by the B → X s γ measurements. Its value is sensitive to beyond Standard-Model physics, while the shape of the photon spectrum is not [57]. The two terms in the last equality in Eq. (S5) are separately µ 0 independent order by order in α s . The barred coefficients C i (µ) are defined as where C eff 7,8 (µ 0 ) are the standard scheme-independent effective Wilson coefficients [58]. The additional factors m b (µ 0 )/ m b are included in C 7,8 to convert to a short-distance b-quark mass scheme, m b , which improves the convergence of perturbation theory.
The r i (µ 0 , m b , m c ) in Eq. (S5) encode the finite virtual corrections from operators other than O 7 that give rise to singular contributions to the photon energy spectrum, and are responsible for the difference between C 7 and C incl 7 . Hence, the split matching procedure essentially amounts to matching H eff at µ 0 onto a single O 7 chromomagnetic operator, which is subsequently matched onto its corresponding operator in SCET. In doing so, we treat the charm quark as a heavy quark and integrate out both charm and bottom quarks at the scale µ 0 . As a result, (most of) the sizable contributions from cc-loops proportional to C 1,2 C 7 and C 2 1,2 are contained within |C incl 7 | 2 , including their full m c dependence, namely in the terms r 1,2 (µ 0 , m b , m c ). As already mentioned in the main body, this organization of the perturbative contributions has the advantage that the associated theory uncertainty due to the m c dependence only enters in the SM prediction for C incl 7 , but does not limit the accuracy with which C incl 7 can be extracted from the experimental data. This treatment is furthermore motivated by the fact that in the experimental measurements of B → X s γ, charmed final states are not included in the signal and are treated as background.

Perturbative results
The coefficient C 7 in Eq. (S5) is defined to be µ 0 independent and to satisfy C 7 = C 7 ( m b ). Thus it is equal to C 7 (µ 0 ) plus the additional terms from the renormalization group that cancel the µ 0 dependence of C 7 (µ 0 ) and vanish when µ 0 = m b . Explicitly, up to O(α 2 s ) with m b in the 1S mass scheme, we have where β 0 = (11C A − 4T F n f )/3 and n f = 5 is the number of active flavors above the scale m b , and C A = 3, C F = 4/3, m are anomalous dimension coefficients defined via For example, γ (0) m = −8 and γ (0) 77 = 32/3. The full set of required anomalous dimension coefficients can be found in Ref. [59].
To fully implement the split matching procedure it is convenient to also define scale-independent coefficients C i =7 , which appear in the nonsingular terms in Eq. (3). Analogous to C 7 above, they are defined to be µ 0 independent and to satisfy C i = C i (µ 0 = m b ). To one-loop order they are given by we have at NNLO while for k = 1, . . . , 6, The results of Refs. [41,42] give A value for Im r (2) 8 is not yet known, but it only contributes to the spectrum at O(α 3 s ). In Eq. (S13), n h = 1 is the number of flavors with mass m b , and n l = 4 is the number of massless flavors, since we neglected for simplicity the m c dependence in r (2) 8 . The full m c dependence of Re r (2) 8 , arising from cc loops inserted into gluon propagators, is known [41,42], but the massless approximation is sufficiently accurate for our purposes.
For r 1−6 (µ 0 ) we have the NLO coefficients [60][61][62] r (1) Here, ρ = m 2 c / m 2 b and a(ρ), b(ρ), and X b are given in Ref. [62]. Since the Wilson coefficients C 3−6 are small, the r 3−6 terms only have very small impacts, and their NNLO contributions r Ref. [63] + i Im r where the terms in the square brackets are given in Eqs. (26) and (27) of Ref. [63], and the ellipses denote other independent color structures. The full NNLO contributions to r 1,2 are required to cancel the m c -scheme dependence and have been computed in the limits m c m b /2 and m c = 0 [38,39,64]. The SM prediction for C incl 7 in Eq. (5) is obtained using the above results together with the input parameters and numerical values for the Wilson coefficients given below in Sec. E. Although the two terms in Eq. (S5) are formally µ 0 independent, there is still residual µ 0 dependence from the truncation of perturbation theory. We vary µ 0 between m b /2 and 2 m b to obtain an estimate of the associated perturbative uncertainty, quoted in Eq. (5) with the subscript "scale". In addition, to estimate the uncertainty from missing O(α 2 s ) charm-loop contributions we use the α 2 s β 0 result in Eq. (S15) with a multiplicative prefactor of 1.0 ± 0.5, yielding the uncertainty quoted in Eq. (5) with a subscript "cc". The parametric uncertainties from input parameters, including the numerical value of m c itself, are much smaller than these two sources of uncertainties and can be safely neglected.

C. Perturbative ingredients for the photon energy spectrum
The perturbative components of the photon energy spectrum are described by Eq. (3), which we repeat here for convenience Definitions for the Wilson coefficients C incl 7 and C i are given above in Sec. B, W s 77 (k) contains the dominant singular contributions and the W ns ij (k) are the various nonsingular terms. In our formula for dΓ/dE γ in Eq. (1) we have kept an overall E 3 γ kinematic prefactor. Here one power of E γ arises from the photon phase-space integration, and for the dominant 77-like contributions two more factors of E γ arise from the derivative that acts on the photon field in each F µν . Since these factors are universal we do not expand them about the singular limit. This improves the behavior of the decomposition into singular and nonsingular terms in the tail region where these components become comparable.

Singular contributions
The all-order factorization theorem for the singular contributions W s 77 (k) is well known [24,65]. For our treatment we follow Ref. [10] and make use of the SCET-based factorization theorem, expressing the perturbative ingredients in a short-distance scheme. (In the notation of Ref. [10], µ i = µ J and µ Λ = µ S .) Here h s , J, and C 0 are the fixed-order hard, jet, and soft functions, which we include up to NNLO. The evolution kernels U H and U S sum large logarithms of k/ m b ∼ 1 − 2E γ / m b to all orders in perturbation theory, and are included at NNLL order. The perturbative expressions for h s , J, C 0 as well as U H and U S together with the required anomalous dimensions can be found in Ref. [10], where they were obtained using results from Refs. [22,[29][30][31][32][33][34][35][36][37].
In the appropriate region the logarithmic summation is achieved by choosing The dependence of W s 77 (k) on µ b , µ J , and µ S cancels between the fixed-order functions and evolution kernels order by order in resummed perturbation theory, and will be used to estimate higher-order perturbative uncertainties. The precise procedure we use to estimate these uncertainty and to transition into and out of this resummation region is described in more detail in Sec. C 3 below.
The hard function h s arises from matching the QCD chromomagnetic operator O 7 onto a corresponding SCET operator at the scale µ b , which is the second step in the split matching procedure described in the previous section. To be consistent with the first step of the split matching, also here we integrate out bottom and charm quarks at the hard matching scale µ b . As a result, the hard function includes all effects of virtual massive charm loops inserted into gluon propagators, which starting at two loops gives rise to its m c dependence given by where h s ( m b ) is the two-loop result for massless quarks given in Ref. [10] and f 2s (ρ) is extracted from the results of Ref. [66]. At the same time, all SCET ingredients are defined for n f = 3 massless flavors.

Nonsingular contributions
The remaining nonsingular terms W ns ij (k) in Eq. (S16) are included using fixed-order perturbation theory. These terms are power-suppressed by k/ m b in the B → X s γ peak region, but loose this suppression in the tail of the spectrum where k ∼ m b . By using C incl 7 and C i in Eq. (S16), the W ns ij are also µ independent order by order in α s . We use the notation µ = µ ns for the residual scale dependence in all nonsingular terms, and will vary this scale as part of our perturbative uncertainty estimate. Up to O(α 2 s ) we write The NLO and NNLO coefficient functions, w ns(1) ij (x) and w ns(2) ij (x), are determined by taking the full fixed-order results for dΓ/dE γ calculated in the literature, reorganizing the Wilson coefficients as in Eq. (S16), and then using Eq. (1) with F(m B − 2E γ − k) = δ( m b − 2E γ − k) and subtracting the fixed-order singular terms predicted by W s 77 (k) at each order. When this construction is carried out with the results consistently expressed in a short-distance mass scheme, there is an additional O(α 2 s ) correction induced, which is denoted as ∆w ns ij (x) in Eq. (S19). Note that the extraction of the nonsingular corrections is somewhat nontrivial. For example, if we take the full theory result in a short-distance mass scheme and extract the coefficient of the terms proportional to Re[C 7 C * 8 ] (setting µ = m b ), we find Here the δ(x) and w s (1) 77 (x) terms are both reproduced by the singular E 3 γ |C incl 7 | 2 W s 77 term. Furthermore, the Re(r Only the remaining terms contribute to W ns 78 , as indicated by their superscripts. By far the numerically dominant nonsingular corrections come from W ns 77 . Using as input the results from Refs. [31,40,67], we find that the one-loop and two-loop nonsingular coefficient functions are w ns(1) 77 To write w ns(2) 77 (x) we defined the following functions of x, which diverge at most logarithmically for x → 0 Above we mentioned that a factor of E 3 γ ∝ (1−x) 3 was universal for the 77 contributions. It follows that (1−x) 3 W ns 77 should also vanish as (1 − x) 3 as x → 1, and hence that the w ns 77 (x) coefficients should vanish like (1 − x) 3 for x → 1 to cancel the overall factor 1/(1 − x) 3 in Eq. (S19). Expanding the above results in the limit x → 1, we find w ns(1) 77 If we would expand the E 3 γ in the singular SCET contribution W s 77 , then this would modify the nonsingular contribution, such that it would not vanish like E 3 γ either. In this situation, as was also noted in Ref. [68], the proper E 3 γ behavior of the spectrum would be obtained only by nontrivial cancellations between the singular and nonsingular contributions. Although formally the difference between these approaches corresponds to a different treatment of nonsingular corrections, this difference can be numerically important even to rather low values of x because of the third power, and the fact that the resummation in the singular terms can potentially spoil the cancellation at small x. For this reason, our approach of keeping the E 3 γ prefactor unexpanded is preferred. For the remaining nonsingular coefficient functions, the fixed-order C 7 C 8 result from Refs. [41,42]  . Although both of these coefficients are used in our analysis, for brevity of the presentation we only quote here the first-order term w ns(1) 78 Finally, for the remaining nonsingular structures, the full theory results at one loop are well known [69][70][71][72]. They enable us to determine the following O(α s ) nonsingular coefficient functions where ρ = m 2 c / m 2 b and the charm-loop functions G 1,2 (u) are given by The corresponding O(α 2 s ) nonsingular coefficient functions w are not yet fully known. However, we stress that analogous to the C 7 C 8 contribution in Eq. (S20), all singular contributions as well as a subset of the nonsingular contributions appearing at two loops that behave O 7 -like are already accounted for via the |C incl 7 | 2 (W s 77 + W ns 77 ) term. For the remaining two-loop contributions w ns(2) ij we use the known results for the α 2 s β 0 terms obtained from the full theory results of Refs. [43][44][45], and thus leave out contributions with the color structure C A . Again for brevity, we do not list here the results for these coefficient functions. The nonsingular corrections for i, j = 3, 4, 5, 6 are known to be very small [72] and are neglected.

Scale choices and estimation of perturbative uncertainties
We now discuss our treatment of the central scales µ i and their variations used to estimate perturbative uncertainties. The soft and jet scales take different forms in the three parametrically distinct regions of the spectrum: 1) SCET shape function region: 2) Shape function OPE: This can be properly accounted for by using profile functions, µ S = µ S (E γ ) and µ J = µ J (E γ ), as discussed in Ref. [10] (see also Ref. [73]). The hard scale µ b ∼ m b is independent of E γ . For the remaining scales we use The constant parameters µ 0 , µ b , E 1 , E 2 , e J , and e ns can be varied to assess perturbative uncertainties. In Eq. (S28) the soft scale µ S (E γ ) takes the value µ S = µ 0 ∼ 1 GeV > ∼ Λ QCD in the SCET region given by E 1 ≤ E γ . In the local OPE region, E γ < E 2 , all the scales become equal, µ S = µ J = µ ns = µ b , which turns off the resummation and is crucial for the singular and nonsingular contributions to properly recombine to reproduce the local OPE prediction for the spectrum. In between these two we have a transition region where we join the soft scales in a smooth manner, as given by the quadratic functions of E γ shown in Eq. (S28). Since the transition scales E 1 and E 2 are not very widely separated, there is no need to separately implement a shape function OPE scaling region for the soft scale, noting that it is anyway well captured by the form of µ S (E γ ) used in the transition. The parameters e J and e ns provide a means to independently vary the jet and nonsingular scales when assessing perturbative uncertainties. By default we have e J = e ns = 0. For µ J this gives the geometric mean of the soft and hard scales as required. For µ ns we choose our default as the geometric mean between the hard and jet scales, and we will vary this choice up to the hard scale and down to the jet scale. This allows us to capture the fact that the nonsingular perturbative series are sensitive to lower scales than the hard scale (as would be made explicit in subleading power factorization theorems for these terms).
Taken together we consider a total of 3 5 = 243 different variations for the profile parameters to assess the perturbative uncertainty, given by the choices For each parameter, the first case in the list is the default central value, and the next two are the variations. We do not vary E 2 since our fit analysis is not sensitive to the uncertainty in the spectrum in the region E γ < ∼ 1.6 GeV. To assess the theoretical uncertainty we separately carry out the fit for each of these 243 cases and then consider the spread of the results as giving the range of possibilities for the central values. The results for these 243 fits are shown by the dark yellow shape function curves in Fig. 2 and as the yellow scatter points in Figs. 3 and S2. The theoretical uncertainty for a given quantity is then obtained by using the largest absolute deviation of these results from the default central value.

Shape-function basis
We briefly summarize the functional basis used for expanding the shape function in Eq. (6). For more details we refer to Ref. [10]. The orthonormal basis functions f n (x) are given by φ n (y) = 2n + 1 2 where φ n (y) is an orthonormal basis on y ∈ [−1, 1], given by the normalized Legendre polynomials. The function y(x) can be any variable transformation that maps x ∈ [0, ∞) to y ∈ [−1, 1], i.e., it has to satisfy y(0) = −1, y(∞) = +1, and y (x) > 0. Given any positive and normalized function Y (x) on x ∈ [0, ∞), we can construct y(x) from its integral With this construction we have Hence, Y (x) or equivalently F 00 (k) acts as the generating function for the basis, for which we can use any suitable model function.
We consider the following functional forms where the parameter p determines the behavior of F 00 (k) ∼ k p for k → 0. As explained in Ref. [10], for integer p we need at least p ≥ 3 to ensure that after short-distance subtractions, which involve taking two derivatives of F (k), the spectrum vanishes at the kinematic endpoint. We have tested the three functional forms Y exp (x, 3), Y exp (x, 4), Y gauss (x, 3) in the pre-fit. Of these, Y exp (x, 3) provides the best pre-fits and is thus used as the default functional form.
those related to the calculable O(λ 2 / m 2 c ) corrections to the total rate [49][50][51], which enter via the subleading shape function g 27 (k) as discussed below.
The As pointed out in Ref. [79], the potentially relevant O 8 O 7 contribution can be constrained using the measured isospin asymmetry in B → X s γ, defined by To see this, we decompose these contributions to Γ 0 and Γ − , denoted as δΓ − and δΓ 0 , according to the quark to which the photon couples (besides the O 7 operator), where for δΓ a the photon couples to the valence quark flavor, and for δΓ b to any non-valence flavors. For the non-valence contribution we used that SU (3) flavor symmetry implies that δΓ b is universal at leading order. Since Q u + Q d + Q s = 0, both contributions are proportional to δΓ a − δΓ b . The isospin asymmetry is given by [79] Hence, the relative impact of these contributions to the isospin-averaged rate is given by where we used the latest Belle measurement ∆ 0− = −(0.48 ± 2.12)% [80] (for m Xs < 2.8 GeV or equivalently E γ > 1.9GeV), which is nearly a factor of three more precise than earlier results. Hence, the O 7 O 8 contribution is experimentally constrained to be much smaller than the current sensitivity, and can be neglected. Concerning the O 2 O 7 contribution, unlike Ref. [47], we treat the charm quark as heavy in our analysis, which amounts to expanding the charm loop in Λ QCD /m c . The resulting contribution to the spectrum is then given in terms of an unknown O(Λ 2 QCD ) subleading shape function g 27 (k) as In a complete factorization analysis, this contribution would involve some evolution between hard, jet, and soft contributions, which is currently not known. To provide some reasonable Sudakov suppression in the peak region, which is important to avoid artificially enhancing this contribution relative to the leading, resummed W s 77 in Eq. (S17), we include in it the NLL evolution factor of the leading contribution given by the product U NLL = [U H (µ b , µ J ) U S (k, µ J , µ S )] NLL with µ b,J,S fixed to their central scales.
The subleading shape function g 27 (k) is not known, but its moments can be calculated in terms of local matrix elements. To parametrize it, we expand it as where we use our default λ = 0.55 GeV and the functional basis f n (x) is generated from Y exp (x, p). For our central results we use p = 2, corresponding to linear scaling F n (k) ∼ k for k → 0, and for the uncertainties we also use p = 4. The dΓ g27,n /dE γ in Eq. (S1) are obtained by inserting the basis expansion in Eq. (S42) into Eq. (S41). At present, we have no sensitivity to determine the basis coefficients d i from the data. Instead, we determine d 0 and d 1 for a given value of d 2 from the norm and first moment of g 27 (k), which are given by For our central results we set d 2 = 0, and to estimate the uncertainties we vary d 2 by an O(1) amount to provide a reasonably large variation in the shape of g 27 (k). The variations for g 27 (k) for fixed norm and first moment are illustrated in Fig. S4, with the solid orange line showing the default central choice.
The main impact of this contribution is due to the norm of g 27 (k) ∼ λ 2 , which reproduces the well-known O(λ 2 /m 2 c ) correction to the total rate [49][50][51]. The central values and uncertainties used for λ 2 and ρ 2 are discussed in Sec. E 3. The uncertainties due to the unknown shape of g 27 (k) beyond its norm and first moment are much smaller than the fit uncertainties. They change the extracted |C incl 7 | by at most 0.25% and m 1S b by 4 MeV, and are thus irrelevant at the present level of accuracy and can be neglected. With more available data in the future, the d n coefficients could also be included in the fit and constrained by the data.

E. Numerical inputs
Here, we collect all numerical input values entering in our analysis. The following values are taken from Ref. [52]:

Wilson coefficients
At and above the split-matching scale µ 0 = 4.7 GeV, we always use the exact 4-loop running of α s (µ) with n f = 5 flavors. To obtain the SM values of the Wilson coefficients at µ 0 , we start from the full NNLO O(α 2 s ) boundary conditions [81,82] at the weak scale µ weak = 160 GeV and evolve them down to µ 0 with the anomalous dimensions up to O(α 3 s ) [59,[83][84][85][86][87][88]. To perform the evolution, we use the exact numerical solution of the coupled RGE system. For the boundary conditions we convert the above top-quark pole mass to m b (160 GeV) = 163.3 GeV. For