Neutral B -meson mixing from full lattice QCD at the physical point

We calculate the bag parameters for neutral B -meson mixing in and beyond the Standard Model, in full four-flavor lattice QCD for the first time. We work on gluon field configurations that include the effect of u , d , s , and c sea quarks with the highly improved staggered quark (HISQ) action at three values of the lattice spacing and with three u=d quark masses going down to the physical value. The valence b quarks use the improved nonrelativistic QCD action and the valence light quarks, the HISQ action. Our analysis was blinded. Our results for the bag parameters for all five operators are the most accurate to date. For the Standard Model operator between B s and B d mesons we find ˆ B B s ¼ 1 . 232 ð 53 Þ and ˆ B B d ¼ 1 . 222 ð 61 Þ . Combining our results with lattice QCD calculations of the decay constants using HISQ quarks from the Fermilab/MILC Collaboration and with experimental values for B s and B d oscillation frequencies allows determination of the Cabibbo-Kobayashi-Maskawa (CKM) elements V ts and V td . We find j V ts j ¼ 0 . 04189 ð 93 Þ , j V td j ¼ 0 . 00867 ð 23 Þ , and j V ts j = j V td j ¼ 0 . 2071 ð 27 Þ . Our results agree well (within 2 σ ) with values determined from CKM unitarity constraints based on tree-level processes (only). Using a ratio to Δ M s;d in which CKM elements cancel in the Standard Model, we determine the branching fractions Br ð B s → μ þ μ − Þ ¼ 3 . 81 ð 18 Þ × 10 − 9 and Br ð B d → μ þ μ − Þ ¼ 1 . 031 ð 54 Þ × 10 − 10 . We also give results for matrix elements of the operators R 0 , R 1 , and ˜ R 1 that contribute to neutral B -meson width differences.


I. INTRODUCTION
The Standard Model (SM) description of neutral B d and B s oscillations requires knowledge of hadronic parameters derived from the matrix elements of 4-quark operators between B q andB q states. These 4-quark operators come from the effective electroweak Lagrangian at energy scales appropriate to B physics, and the matrix elements can only be determined by lattice QCD calculations, which are now able to include the full impact of QCD on such hadronic quantities [1]. The accuracy with which this can be done is the limiting factor in the constraint that can be obtained from the now very precise experimental results on the neutral meson mass difference (seen as an oscillation frequency). In the Standard Model this constraint leads to a determination of the Cabibbo-Kobayashi-Maskawa (CKM) matrix elements that accompany the 4-quark operators of the Standard Model. New physics models with extra heavy particles extend the effective Hamiltonian to include additional 4-quark operators. Constraints on the new physics from experiment then need accurate determination of the matrix elements of the new operators. Again this can come only from lattice QCD calculations.
Here we provide the first "second-generation" lattice QCD calculation of the matrix elements of all five ΔB ¼ 2 operators of dimension six for the B s and B d . We improve on earlier calculations by working on gluon field configurations generated by the MILC Collaboration that include u, d, s, and c quarks in the sea with u=d quark masses going down to their physical values. Although this obviates the need for a chiral extrapolation, we also include heavier u=d quark masses in our set of results so that we can map out the dependence on the light quark mass. The discretization of QCD that we use is fully improved through Oðα s a 2 Þ for both the gluons and the light quarks (including all of those in the sea) for which the highly improved staggered quark (HISQ) action is used. For the b quarks we use improved nonrelativistic QCD, which includes Oðα s Þ corrections to terms at order v 4 (where v is the heavy quark velocity). By linking this calculation directly to our earlier one for the B-meson decay constants [2] (that parametrize the amplitude to create a meson from the vacuum) we are able to give results directly for the "bag parameters" associated with each operator and take advantage of the cancellation of a number of systematic effects. The bag parameters (to be defined in Sec. II A) encode the multiplicative factor by which the operator matrix element differs from that expected in the vacuum saturation approximation, which is related to the decay constant. To the extent to which this approximation works (and we will show here that it does work well) we expect the bag parameters to have very little dependence on light quark sea or valence masses and even on the lattice spacing. This enables improvements in accuracy over earlier work along with a much simpler picture of the extrapolation to the physical point.
The first unquenched lattice QCD calculations of the matrix elements for neutral B meson mixing focused purely on results for the Standard Model operators [3,4] and ratios for B s to B d [5]. Calculations have also been done in the infinite heavy quark mass limit [6]. More recently calculations of matrix elements for the full set of SM and beyond the Standard Model (BSM) operators have been done [7,8]. The calculations in [7] use the twisted mass formalism for all quarks on gluon field configurations including u and d quarks in the sea. An extrapolation of results [renormalized using the regularisation-invariant momentum-subtraction (RI-MOM) scheme] is made from a heavy quark mass in the charm region up to the b quark mass using ratios with a known infinite mass limit. The calculations in [8] use the Fermilab formalism for the b quark and the asqtad formalism for the light quarks on gluon field configurations that include u, d, and s quarks in the sea with the asqtad formalism. Perturbatively renormalized 4-quark operator matrix elements (only) are calculated and so bag parameters must be derived using decay constant results from elsewhere. A very recent result in [9] uses domain-wall quarks on gluon field configurations including u, d, and s in the sea and extrapolates in heavy quark mass to the b quark mass for B s to B d ratios for SM mixing matrix elements and decay constants.
In Sec. II we discuss the 4-quark operators relevant to B mixing and how they are implemented on the lattice. Section III describes our lattice calculation and results.
We compare our results to previous work in Sec. IV, determine CKM elements V ts and V td using experimental results on B-meson mass differences, determine branching fractions for the rare decays of B d and B s to μ þ μ − , and give matrix elements for derived operators that contribute to width differences. Section V gives our conclusions and discusses the prospects for future improvements. Details about our analysis are contained in four appendixes: Appendix A on the lattice QCD correlators that we calculate and how we fit them to obtain matrix elements and bag parameters; Appendix B on the chiral perturbation theory fits we use to combine results at physical and unphysical light quark masses; Appendix C on correlations in the uncertainties for our final results and last, with more general applications beyond this analysis, Appendix D on singular value decomposition (SVD) cuts and fitting correlators.

A. Continuum 4-quark operators
Neutral B-meson mixing occurs at lowest order in the Standard Model through box diagrams involving the exchange of W bosons and top quarks; see Fig. 1. These box diagrams can be well approximated by an effective Lagrangian expressed in terms of 4-quark operators.
Here we will examine all five of the independent local dimension-6 operators that could contribute to ΔB ¼ 2 processes [8,10]: where V ¼ γ μ , A ¼ γ μ γ 5 , S ¼ 1, and P ¼ γ 5 , and sums over μ and color indices i and j are implicit. In the Standard Model, the most important of these for B −B mixing is O 1 . This operator mixes with O 2 under renormalization.
Operators O 4 and O 5 do not appear in the Standard Model, but do arise in various BSM scenarios. It is conventional to parametrize matrix elements of these operators in terms of "bag parameters," FIG. 1. An example from the Standard Model of a mechanism that mixes the neutral B q andB q . The amplitude is well approximated by a contact term for matrix elements between B q -meson states.
where here μ is the renormalization scale, and M B q and f B q are the mass and weak decay constant of the B q meson: The normalization parameter η q i ðμÞ is chosen so that the bag parameters equal 1 in the "vacuum saturation approximation," where gluon (and other QCD) exchanges between the initial and finalB q and B q are ignored (see [8,10] for more details): We use renormalization scale μ ¼ m b ðm b Þ; the corresponding values for the normalization factors are given in Table I.
The bag parameters provide both computational advantages and physical insights. The leading-order logarithms in chiral perturbation theory, coming from the matrix element of the 4-quark operator and f 2 B q , partly cancel in the ratio; see Appendix B. In particular, the coefficient of the chiral logarithm from the tadpole diagrams is reduced by a factor of 4. Therefore bag parameters should be less dependent upon the light-quark mass; we find very little mass dependence. Finite-volume effects will be correspondingly reduced. We also find that most of the dependence on lattice spacing cancels. Finally, as we will show, the bag parameters all turn out to be of order one, suggesting that vacuum saturation is a useful approximation. For these reasons, we focus here on bag parameters; values for the matrix elements are easily obtained from the bag parameters given values for the decay constants [2,13].

B. Lattice QCD 4-quark operators and matching
Matrix elements of the 4-quark operators are regulator dependent, and so we need to convert matrix elements calculated in our simulation (with the lattice regulator) into the corresponding matrix elements for the more conventional MS scheme. The differences between the two schemes are ultraviolet and so can be calculated using QCD perturbation theory. To the lowest and first order in α s the relationship has the form (for μ ¼ m b ) The coefficients z ij relevant to our simulation were calculated in [14] and are summarized in Table II. The scale for α s depends on the lattice spacing; we use the same values for α s used in [15] to calculate renormalizations for the axial-vector current that couples to B q mesons (see Table IV for the values).  (4)]. These are calculated using M B s ¼ 5.3669ð2Þ GeV and M B d ¼ 5.2796ð2Þ GeV [11];m b ðm b Þ ¼ 4.162ð48Þ GeV and m b =m s ¼ 52.55ð55Þ [12]; and m s =m l ¼ 27.18ð10Þ [13].  [14], with z ij ≡ ρ ij − ζ ij where ρ ij and ζ ij are listed in Tables III and IV of that paper a . The perturbative coefficients z A 0 for the temporal axial current [Eq. (A6)] are also listed; these are from [2], which used results from [17].
Our lattice analysis uses nonrelativistic QCD (NRQCD) for the b dynamics. Quarks and antiquarks decouple in NRQCD, and so correspond to separate fields. As a result the lattice version of a 4-quark operator has the form [14] annihilates a b antiquark. These are the lattice operators we use on the right-hand side of Eq. (5). The 1=m b terms are the OðΛ QCD =m b Þ corrections to the operator in NRQCD.
A complication for operators O 2 and O 3 is the treatment of "evanescent operators." Our matching results use the MS NDR scheme of [16] (BBGLN). These matrix elements are readily converted to the alternative scheme of [18] (BJU) using the following equations [through Oðα s Þ, with [19,20]: The matching coefficients z ij from Eq. (5) are plotted against am b in Fig. 2. These coefficients are not large and have a relatively benign dependence on the b-quark mass across the range that we use, although different coefficients behave differently. The diagonal coefficients z ii are much larger than the corresponding coefficients for the NRQCD-HISQ axial current (z A 0 in Table II), which are unusually small. Note that the only nonzero off-diagonal coefficients are for ij equal to 12, 21, 31, 45, and 54, and that these tend to be smaller than the diagonal parameters.
It is worth remarking here on the similarities and differences between the perturbative matching we apply here and that used by the Fermilab/MILC Collaborations [8] in their determination of B mixing matrix elements. They also make use of a perturbative calculation of the matching to Oðα s Þ. They do this after a nonperturbative determination of factors that are needed to remove normalization artifacts from the clover and asqtad actions that they use for heavy and light quarks, respectively, and without which they would have large Oðα s Þ coefficients. We do not need to apply this procedure because the NRQCD and HISQ actions are well behaved in this respect [21]. After applying their nonperturbative procedure, the Fermilab/MILC Collaborations give results for their Oðα s Þ coefficients in Table III of [8]. Their coefficients differ from ours because they are using a different discretization of QCD for both heavy and light quarks.
However, qualitatively the coefficients show similar behavior in terms of magnitude and dependence on the lattice b quark mass (given in their case by the parameter κ 0 b ).

A. Simulations
We use seven ensembles of gluon field configurations recently generated by the MILC Collaboration [22,23]. Details are given in Table III. We use ensembles at three values of the lattice spacing, a, to control discretization effects and at three values of the light quark mass down to the physical point to map out sea quark mass effects. Discretization effects depend on a 2 and sea quark mass effects are approximately linear, so a range in a 2 of a factor of 3 and in sea light quark mass of a factor of 5 allows us substantial leverage to pin down these effects. The lattice spacing values were determined using the mass splitting between ϒ and ϒ 0 , as described in [24] where a discussion of systematic errors can be found. The sea quarks use HPQCD's HISQ action [25] which we have shown to have FIG. 2. Coefficients z ij of Oðα s Þ terms in the matching of lattice NRQCD-HISQ 4-quark operators to the MS scheme plotted as a function of the bare NRQCD b quark mass in lattice units. The top plot shows the diagonal coefficients (i ¼ j) that enter the renormalization of a given operator; the lower plot shows the offdiagonal coefficients (i ≠ j) corresponding to the mixing of different operators. See Eq. (5) for the definition of z ij and Table II for the values. The ij values are indicated in the key. "A 0 " refers to the Oðα s Þ coefficient for the renormalization of the temporal axial current (z A 0 in Table II). small discretization errors even for charm quarks [26][27][28]. This enables four flavors of quarks to be included in the sea, with masses given in Table III. The u and d quark masses are taken to be the same.
The valence b quarks are implemented using lattice NRQCD [29]. The action is described in detail in [24]. It includes a number of improvements over earlier calculations, in particular one-loop radiative corrections (beyond tadpole improvement [30]) to most of the coefficients of the Oðv 4 b Þ relativistic correction terms. The tadpole improvement of the action is done using the Landau gauge link, with u 0L values given in Table IV. This action has been shown to give excellent agreement with experiment in recent calculations of the bottomonium [24,31,32] and B-meson spectra [33]. The b quark mass is tuned, giving the values in Table IV, by fixing the spin-averaged kinetic mass of the ϒ and η b states to experiment [24]. NRQCD breaks down as am b → 0 but all our values of am b are substantially larger than 1, where there is no problem.
The HISQ valence light quark masses are taken to be equal to the sea mass except on set 4 where there is a slight discrepancy. The s quark is tuned using the mass of the η s meson [34], a fictitious pseudoscalar ss state which is not allowed to decay on the lattice. Its properties can be very accurately determined in lattice QCD, and we find M η s ¼ 0.6885ð22Þ GeV [35]. Values for valence s masses are given in Table IV and corresponding values of M η s in lattice units in [35]. We allow for uncertainties from mistuned valence masses in our determination of the physical results.

B. Simulation results and error budget
We describe the 2-point and 3-point correlators used in our analysis in Appendix A. We use the 2-point correlators to extract the decay constants f B q , including the 1=m b corrections [Eq. (A6)]. We also combine them with the 3-point correlators to calculate lattice matrix elements of the O n . Also in that appendix, we discuss the Bayesian fits used to extract physics from these correlators. Our final results for hB q jO n jB q i latt =ðf B q M B q Þ 2 are summarized in Table VIII of Appendix A. As discussed in the appendix, this was a blind analysis.
We convert the lattice expectation values into MS matrix elements using Eq. (5) [divided by ðf B q M B q Þ 2 ]. Our results are listed in Table V. In addition to the statistical errors from the simulation and the (negligible) errors in the z ij , we include uncertainties (for each entry in the table) coming from three additional sources: (i) Oðα 2 s Þ: We estimate this uncertainty to be twice (to be conservative) α s times the magnitude of the Oðα s Þ correction we include for each of our three lattice spacings. (These corrections are correlated between configuration sets with similar lattice spacings.) (ii) Oðα s Λ QCD =m b Þ: OðΛ QCD =m b Þ corrections have been measured for the temporal axial-vector current and found to be 5% of the leading-order contribution [2]. This suggests that OðΛ QCD =m b Þ corrections, which are included in our simulations, are 10% for the 4-quark operators. We account for the Oðα s Þ radiative corrections to these terms by adding the following uncertainty to our results: where each c n;i α s ¼ 0 AE 0.1 and allows for variation in the coefficients between the lattice spacings. (δ a is defined to vary from −1=2 to 1=2 over our mass range; see [24] for more details.)  [24], where the three errors are statistics, NRQCD systematics, and experiment. am l , am s , and am c are the sea quark masses, L × T gives the spatial and temporal extent of the lattices, and n cfg is the number of configurations in each ensemble. We use 16 time sources on each configuration to improve statistics.  (iii) Oðα s ðaΛ QCD Þ 2 ; ðaΛ QCD Þ 4 Þ: The NRQCD and HISQ actions we use in the simulation are highly corrected. In particular, there are no tree-level a 2 errors in either. We account for a 2 α s errors by adding the following uncertainty to our results: α s ðaΛ QCD Þ 2 ðc n;0 a 2 þ c n;1 a 2 δ a þ c n;2 a 2 δ 2 a Þ hO n i MS ðfMÞ 2 þ ðaΛ QCD Þ 4 ðc n;0 a 4 þ c n;1 a 4 δ a þ c n;2 where each c n;i a 2 ¼ 0 AE 1, each c n;i a 4 ¼ 0 AE 1, Λ QCD ¼ 0.5 GeV is the QCD scale, and again the δ a terms allow for variation between different lattice spacings. The final entries ("phys.") in both parts of Table Vare our final results at the physical values of the light-quark masses after our chiral fit. We use chiral perturbation theory to combine the values obtained on the different configuration sets with different light quark masses; see Appendix B for details. Figure 3 compares our final values for B s mesons with the results from individual configuration sets. These plots show that the dependence on lattice spacing and lightquark mass is negligible compared with our uncertainties. The analogous plot for B d mesons is very similar.
Adding uncertainties to the lattice results to allow for operator normalization and lattice spacing effects, as we have done above, is equivalent to including them in our fit function with the coefficients treated as fit parameters; see the appendix of [36]. The uncertainties that are included are correlated between lattice results on different sets through these coefficients. Figure 3 shows that, for example, the lattice spacing effects that we allow for through Eq. (10) are overestimates of what is seen in the results, since the variation with lattice spacing of the central values is much smaller than the uncertainties shown on the coarser lattices. A further test of this is that omitting the results from the smallest lattice spacing   Finally we convert our final results into bag parameters using Eq. (2). The bag parameters are listed in Table VI. Despite the wide variation in values for hO n i=ðfMÞ 2 , the bag parameters are within 30% of 1. This shows that the vacuum saturation approximation can be of some utility. Our final results are shifted by less than half a standard deviation if we omit the data with the largest pion masses, and have errors that are 10%-15% larger.
The error budgets for the B s bag parameters are shown in Table VII. The dominant source of error comes from uncalculated terms in perturbation theory (α 2 s and α s Λ QCD =m b terms). The sensitivity to these terms depends on the operator. For example, it is particularly high for O 3 , because matrix elements for O 3 are a lot smaller than those of O 1 [see Eq. (4)] which are mixed in by Eq. (5). The error budgets for B d mesons are almost identical to those for B s , but have twice as large a contribution from statistical uncertainties in the lattice data. Almost all of the uncertainties, and some of the statistical errors, cancel in ratios of B s to B d meson bag parameters.
Matrix elements of the 4-quark mixing operators can be obtained from the ratios in Table V given values for the decay constants and masses. Note that the corresponding bag parameters for O 2ÁÁÁ5 have larger fractional errors than the ratios, and so should not be used for this purpose. The larger errors result from uncertainties due to the factors η q i in the bag-parameter definition [see Table VII and Eq. (2)].

A. Comparison to previous results
Our results for the bag parameters for all five SM and BSM operators given in Table VI are more accurate than previous lattice QCD results. This is for a number of reasons: (i) We work directly with the bag parameters rather than the 4-quark operator matrix elements. The bag parameters are expected from chiral perturbation theory to have little dependence on valence and sea quark masses (see Appendix B). This expectation is     (Table VI). The total error for each quantity is also shown. The error budgets for the B d meson's bag parameters are very similar. Systematic errors from finite-volume, QED, and strong-isospin breaking effects are estimated to be below 0.1% and hence negligible in Appendix B 5. borne out in our results and means that we are easily able to combine results at both unphysical and physical light quark masses. (ii) We have results for the physical light quark mass at two values of the lattice spacing improving control of the chiral extrapolation. (iii) The gluon field configurations that we use include the effect of u, d, s, and c quarks in the sea and so we do not have an uncertainty associated with missing flavors of sea quarks (the Fermilab/MILC Collaboration includes a 2% uncertainty in their 4-quark operator matrix elements from missing c in the sea [8]). Figure 5 shows a comparison of our bag parameters for the B s meson to those from [7,8] (and also, for O 1 , to [4]).
The results from [7] include only u and d quarks in the sea and the uncertainty does not include an estimate of the impact of missing s sea quarks. It is therefore not clear whether we should expect agreement between these n f ¼ 2 results and our n f ¼ 4 results. The fact that the n f ¼ 2 purple diamonds from the ETM Collaboration are around 20% below our results for O 4 and O 5 is reminiscent of what is seen in kaon mixing. ETM uses the RI-MOM renormalization scheme for the purple diamonds, and it has been shown in kaon mixing [37] that the use of the RI-MOM scheme (rather than RI-SMOM) for the equivalent 4-quark operators has large systematic errors that push down the value of the bag parameter. This may then be the main reason (rather than a difference of n f ) for the discrepancy with our results for O 4 and O 5 , but more work would be needed to be sure of this.
The n f ¼ 3 and n f ¼ 4 results should be comparable because the impact of missing c quarks in the sea on the bag parameters is expected to be very small [8]. Our new results agree within 2σ in each case with [8] but in every case are more accurate. The largest discrepancy is for B ð1Þ B s at 1.9σ. The weighted average of our n f ¼ 4 results and the n f ¼ 3 results from [8] is given by the grey band in the figure and the value of that average is given in each panel. We assume no correlations, here and subsequently, between our results and those of [8] because they use different actions for both the b quark and the light quarks and different gluon field configurations (with a different sea quark action and generated with a different Monte Carlo updating algorithm). Figure 6 shows a comparison of the ratio of bag parameters for B s to B d for each operator for our new results and those of [8]. Our new results are a lot more accurate, with 2%-3% total uncertainty. All of the ratios are very close to 1, but there is a sign of a systematic trend for the ratio for O 2 and O 3 to be above 1 and for O 4 and O 5 below 1. This is not visible in the results of [8] but does start to emerge with the improved accuracy of our results. This is in general agreement with the results from using sum rules in [38,39]. We also include in Fig. 6 results for O 1 from HPQCD [4] and RBC/UKQCD [9]. The RBC/UKQCD result has a 1% uncertainty.

B. Derived quantities
Our results for the bag parameters can be combined with results for the B and B s decay constants to give values for the 4-quark operator matrix elements using Eq. (B1) and our results in Table V. For this we use the most accurate current lattice QCD results obtained on gluon field configurations including u=d, s, and c quarks in the sea. These have been obtained by the Fermilab/MILC Collaboration using the HISQ action for all quarks [13]. This "heavy-HISQ" approach, pioneered by HPQCD [40,41], uses pseudoscalar meson 2-point correlators that combine heavy FIG. 5. A comparison of our results (red filled circles at n f ¼ 4) to previous lattice QCD values for the B s bag parameters B B s ðm b Þ in the MS scheme for all five SM and BSM operators. Previous results come from the Fermilab/MILC Collaboration on n f ¼ 3 gluon field configurations (blue crosses) [8] and the ETM Collaboration on n f ¼ 2 gluon field configurations (purple filled diamonds) [7]. Note that the ETM results for O 4 and O 5 have been converted to the definition of the bag parameter given in Eq. (4). The filled green square at n f ¼ 3 for the O 1 operator comes from an earlier HPQCD calculation using NRQCD b quarks [4]. The n f ¼ 2 results are missing s sea quarks, whose impact cannot be estimated perturbatively (and no uncertainty is included for this in the error bars). It is therefore unclear what level of agreement to expect between these results and those for n f ¼ 3 and 4. Since we do not expect missing c in the sea to have a significant impact on the bag parameters [8] we can meaningfully compare n f ¼ 3 and n f ¼ 4. The grey bands are the weighted average of our new results with those of [8], and the average value of the bag parameter B ðnÞ B s ðm b Þ for each operator O n is indicated in that panel. We include a vertical line at value 1.0 for comparison to the vacuum saturation approximation. and light quark propagators calculated with multiple heavy quark masses, am h , at multiple values of the lattice spacing. m h reaches the b quark mass for am h < 1 for lattice spacing values a < 0.045 fm. Since the HISQ action has very small discretization errors by design, a fit to the m h and a dependence is possible that allows the continuum m h dependence of the decay constant to be reconstructed. It can then be evaluated at the b quark mass to enable the B and B s decay constants to be determined. Note that the correlators can be absolutely normalized in this case and so there is no normalization uncertainty.
Fermilab/MILC obtain the values f B d ¼0.
1905ð13Þ GeV, f B s ¼ 0.2307ð13Þ GeV, and f B s =f B d ¼ 1.2109ð41Þ. 1 Note that we use the decay constant for the neutral B d meson (not the B u ), which is the appropriate choice here. Our bag parameters are calculated for a light quark l corresponding to the average of u and d. Our results show (comparing those for B s with those for B) that any difference between bag parameters for B l and B d will be much smaller than our uncertainties. This is not true for the decay constants, where the differences are significant [13].
For the SM phenomenology to be determined from our results for the matrix elements of O 1 it is convenient to convert our results from the MS NDR scheme to the renormalization-group-invariant quantitiesB ð1Þ B q . The conversion is given byB The matching factor c RGI is calculated to two loops in perturbative QCD, and we take c RGI ¼ 1.5158ð36Þ [8]. This corresponds to the result for n f ¼ 5 active flavors in the sea andᾱ s ðM Z Þ ¼ 0.1185ð6Þ. Our bag parameters are obtained at scale m b for four flavors of quarks in the sea. The impact of missing b quarks in the sea, however, should be negligible both for the bag parameters and the resulting 4-quark operator matrix elements. A power-counting estimate of such effects would give a relative contribution of α s ðΛ QCD =2m b Þ 2 , which is below 0.1%. Our results for the RGI bag parameters for O 1 are then The ratio of RGI bag parameters is of course the same as that of the MS bag parameters. Combined with the decay constant results from [13] we obtain where ξ is the ratio of the two results above it. We form ξ by combining the result for f B s =f B d from [13] with our results B d , taking advantage of the correlations that reduce uncertainties in each of these ratios. Note that in combining the decay constant and bag parameter results we add relative uncertainties in quadrature. We expect no significant correlation between the two sets of results because they use a different heavy quark action and, even though both results use n f ¼ 2 þ 1 þ 1 gluon field configurations, there is little overlap in the ensembles used. to previous lattice QCD values for the ratio of B s to B d bag parameters for all five SM and BSM operators. Previous results come from the Fermilab/MILC Collaboration on n f ¼ 3 gluon field configurations (blue crosses) [8] using their quoted correlations to reconstruct the ratio. Since we do not expect missing c in the sea to have a significant impact on the bag parameters [8] we can meaningfully compare n f ¼ 3 and n f ¼ 4. The grey bands are the weighted average of these two sets of results and the average value for each operator is indicated in that panel. For O 1 at n f ¼ 3 we also show previous results from HPQCD (green filled square) using NRQCD b quarks [4] and RBC/UKQCD (purple filled diamond) using domain-wall quarks with masses of m c and above and extrapolating results to the b quark mass [9]. We include a vertical line at value 1.0 to make clear which ratios are above, and which below, this value.
The error budgets in the two cases show that the key sources of uncertainty are not the same. The uncertainties in the combinations above are dominated by the uncertainties in our bag parameters and their ratio in Eq. (12) because the decay constant results are now so accurate.  13) to previous lattice QCD results on n f ¼ 3 gluon field configurations from Fermilab/MILC [8] and HPQCD [4]. The Fermilab/MILC results include an uncertainty for missing c in the sea in their calculation. The difference between the central value of our new result and that of Fermilab/MILC is 1.8σ. Because the systematic uncertainties are correlated between our results and those of [4] we do not include the previous HPQCD results in the new lattice QCD n f ¼ 3=n f ¼ 4 average, shown by the grey band in Fig. 7. The average value is shown above the grey band. Figure 8 compares lattice QCD results for the ratio ξ defined in Eq. (13) on n f ¼ 3 gluon field configurations with our new result here using n f ¼ 4. There is good agreement between the lattice QCD results with the most recent (including our new result here) having total uncertainties at the level of 1.5%. The result of averaging our new result with that of [8] (both results being obtained at the physical b quark mass) is given by the grey band with the average value quoted above it.
The current experimental average values [11] for the B s and B d systems are combining statistical and systematic errors in quadrature.
In the SM ΔM is given by Here S 0 is the Inami-Lim function [52] which describes electroweak corrections and has argument x t ¼ m 2 t =M 2 W . The top quark mass to be used here is in the MS scheme,m t ðm t Þ [53]. Taking the current average [11] of direct experimental measurements [54][55][56] of the top quark mass [172.9(4) GeV] as the pole mass, gives m t ðm t Þ ¼ 163.07ð38Þ GeV using the 4-loop expressions in [57]. Evaluating the Inami-Lim function then gives S 0 ð4.116ð19ÞÞ ¼ 2.313ð8Þ. The QCD correction factor, η 2B , is given at next-to-leading order in [58]. We take η 2B ¼ 0.55210ð62Þ [13], again calculated with n f ¼ 5.
The CKM elements V tq and V tb can be derived in the SM by assuming that the CKM matrix is unitary and determining other CKM elements in the same rows or columns from the comparison of theory and experiment [59][60][61][62]. For Eq. (15) it is important to use values for V tq that did not include ΔM q itself in their determination. So we use the results from CKMfitter for the case where only tree-level processes were used in the determination. This gives [61]  for ξ, defined in Eq. (13), to previous lattice QCD values for n f ¼ 3 (filled blue squares). Previous results come from the Fermilab/MILC Collaboration [8] and from HPQCD [4] using calculations at the physical b quark mass. Results are also shown from RBC/UKQCD using domain-wall quarks and extrapolating to the b from the c quark region and above [9] and using static (infinitely massive) b quarks [6]. The grey band is the weighted average of our new results and those of [8] with the result for the average quoted above it. jV ts j CKMfitter; tree ¼ ð41.69 þ0.39 −1.45 Þ × 10 −3 ; jV td j CKMfitter; tree ¼ ð9.08 þ0. 23 −0.45 Þ × 10 −3 ; jV td =V ts j CKMfitter; tree ¼ 0.2186 þ0.0049 −0.0059 ; jV tb j CKMfitter; tree ¼ 0.999093 þ0.000064 −0.000018 : ð16Þ The ratio jV td =V ts j CKMfitter; tree is derived from the CKMfitter results for A, λ,ρ, andη using the formulas in [59]. The central value differs slightly from the ratio of the two numbers above. The final terms in Eq. (15) parametrize the hadronic contribution to ΔM through the matrix element of the appropriate 4-quark operator, O 1 . Our results for f 2 B qB ð1Þ B q are given in Eq. (13).
Putting all these pieces together we obtain predictions for the mass differences for neutral B s and B d eigenstates of where the first error in each case is from the CKM matrix elements and the second error is primarily from the lattice analyses. These results agree well with the experimental values from Eq. (14)-the largest discrepancy is 1.7σ for the ratio of ΔM values-but they have much larger uncertainty.

D. V ts and V td
Because the experimental values for ΔM q are so accurate, a better approach to understanding the implications of our improved lattice QCD results for the relevant hadronic matrix elements is to turn the analysis of the previous subsection on its head. That is, to use our results and the experimental values for ΔM q to determine values for jV ts j and jV td j from Eq. (15) [taking a value for V tb from Eq. (16) [61] ]. jV ts j and jV td j obtained this way can then be compared to other determinations that make use of CKM unitarity as a test of that unitarity.
The ratio of jV ts j to jV td j can be obtained more accurately than the separate CKM elements because this makes use of the hadronic parameter ξ [Eq. (13)] in which a lot of the lattice QCD uncertainties cancel (see Sec. IVA).
Our results are jV td j ¼ 0.00867ð23Þ; jV ts j ¼ 0.04189ð93Þ; jV td j=jV ts j ¼ 0.2071ð27Þ: ð18Þ Figure 9 plots the AE1σ constraints on jV td j, jV ts j and their ratio from our results as the dark grey lozenge. Results determined by other lattice QCD calculations [8,9] are also shown along with a recent determination using sum rules [39]. Also shown as light pink and orange lozenges are results from fits to the CKM unitarity triangle using results from many different processes [61,62]. Particularly relevant here is the green lozenge which results from a unitarity triangle fit that includes tree-level processes only [61], and therefore not B s =B d oscillations. Tension between results derived from ΔM q (as here) and the results derived from tree-level processes and unitarity would imply the existence of new physics in loop processes.
The Fermilab/MILC results (red lozenge in Fig. 9) highlighted an approximately 2.0σ tension between their values for V ts and V td and those from unitarity fits. See [63,64] for examples of the possible implications of this.
Our results show no such tension. Our values for V ts and V td separately agree with the {CKMfitter, tree} results in Eq. (16) within 1σ and the difference in the ratio amounts to 1.8σ. This limits the scope for new physics in loop-induced processes. However, our ratio for jV td j=jV ts j joins the systematic trend of the previous results shown in Fig. 9 in being below that of {CKMFitter, tree}.
The rare decays B q → μ þ μ − have very small branching fractions in the SM since they proceed through W box diagrams and Z penguins and are helicity suppressed. New physics might then be seen if the experimental and SM branching fractions can be determined to be different to sufficient accuracy. FIG. 9. A comparison of AE1σ constraints on V ts and V td from experimental results on B s and B d oscillation frequencies compared to SM calculations. This is an update of Fig. 7 in [39] to include the results presented here. The lattice QCD constraints shown come from the following: this paper, dark grey; [8], red; [9], light blue, jV ts j=jV td j ratio only. The light blue lozenge is from sum rules [39]. The lozenges with dashed boundaries include a full unitarity triangle fit: light pink is from CKMfitter [59,61] and orange from UTFit [60,62]. The green lozenge with dotted boundary is the result of a unitarity triangle fit for tree-level processes only from CKMfitter.
The hadronic parameter that enters the SM branching fraction is the B q meson decay constant [65] but it appears along with the CKM elements jV Ã tq V tb j. The uncertainty in the value of the appropriate CKM element is now the largest uncertainty in the value of the SM branching fraction [13].
An alternative method for determining the branching fraction is to take a ratio to ΔM [66]. In the SM (and extensions with minimal flavor violation) the CKM elements cancel out of this ratio. The decay constant also cancels and the hadronic parameter that remains in the ratio is the bag parameter.
The formula for the time-averaged branching fraction [67], as measured in the experiment, is then given in the SM by Here C A ðμ b Þ includes electroweak and QCD corrections and is given for μ b ¼ 5 GeV in [65]. We use C A ðμ b Þ ¼ 0.4694ð36Þ [8]. The lifetime τ B H q that appears in this formula is that of the heavy neutral eigenstate [67]. For the B d this can be taken as the average lifetime, 1.520(4) ps [68,69], but for the B s there is a measured difference of lifetimes and the heavy eigenstate has the longer lifetime, 1.615(9) ps [68,69]. Values for S 0 ðx t Þ and η 2B are given in Sec. IV D.
Using our results for the bag parameters for O 1 given in Eq. (12) and the experimental values for ΔM in Eq. (14) we obtain the following values for the branching fractions: We can also obtain the ratio of branching fractions for B s and B d [66] BrðB Here we have dropped the terms in m 2 μ =M 2 B q since they are negligible. There is a lot of cancellation in this ratio, including of systematic errors in the ratio of bag parameters [see our results in Eq. (12)]. We obtain the result Figure 10 shows our predictions in the SM for the branching fractions from Eqs. (20) and (22) as the grey lozenge. The red lozenge shows lattice QCD predictions [13] for the branching fractions using the direct approach where the hadronic parameter needed is the decay constant and this is combined with input for the CKM elements V tq and V tb , along with other factors. The errors in the results from [13] are dominated by uncertainties in the CKM elements, which are taken from a global unitarity triangle fit that includes both tree and loop-induced processes. 2 Figure 10 shows good agreement between the two lattice QCD predictions. This reflects the fact that, as described in Sec. IV D our results for the bag parameters yield CKM elements jV ts j and jV td j in agreement with CKM unitarity determinations. Our results imply consistency of the CKM matrix (within uncertainties), and hence the two approaches of using the decay constants plus CKM elements or using the bag parameters and ΔM q will agree.
Note that our results in Eq. (20) include uncertainties in the parameters of Eq. (21). They do not include uncertainties from electromagnetic corrections to the decay process. These are estimated to lead to a reduction of 0.3%-1.1% in the muonic branching fractions in [70]. This is not significant given the current uncertainties in our SM predictions, but will need to be addressed to reduce uncertainties in the future.
The blue band in Fig. 10 shows the current experimental situation. The decay B d → μ þ μ − has only been seen with 3σ significance [71]. Recent LHCb [72] and ATLAS [73] results give upper bounds to the branching fraction of 3.4 × 10 −10 and 2.1 × 10 −10 , respectively. These bounds FIG. 10. A comparison of the SM branching fractions for B s and B d to decay to μ þ μ − from lattice QCD results with the current experimental measurements. The grey lozenge shows results from our calculation here of the bag parameters and a ratio to experimental results for ΔM q [Eqs. (20) and (22)]. The red lozenge shows results from a Fermilab/MILC calculation of B d and B s decay constants, combined with input CKM elements [13]. The blue band shows the current experimental average for BrðB s → μ þ μ − Þ [11]; only an upper bound exists for BrðB d → μ þ μ − Þ. 2 Note that we include the constraint on the ratio from the July 2019 update of [13]. are outside the range of Fig. 10. For the branching fraction for B s → μ þ μ − , the Particle Data Group quotes an average value of 3.0ð4Þ × 10 −9 using results from ATLAS [73], CMS [74], and LHCb [72]. The AE1σ variation gives the width of the band in Fig. 10.
Although no significant tension between experiment and the SM predictions is visible in this figure, it does give encouragement that we are reaching a point where further reductions in uncertainties will start to give serious SM constraints, at least for B s → μ þ μ − .

F. Contributions to ΔΓ
Another physical observable from neutral B meson systems is that of the decay width difference of the eigenstates, ΔΓ. This has been measured for the B s at 13% of the average width, but only an upper limit exists for the B d [11]. The prediction for the width differences in the SM is given in [16,75] in terms of the matrix elements of several 4-quark operators. We give results here for the matrix elements of those labeled R 0 , R 1 , andR 1 in [75].
R 0 [75] is a combination of O 1 , O 2 , and O 3 which is a 1=m b -suppressed operator up to corrections of Oðα 2 s Þ times a leading order operator, evaluating the radiative corrections at μ ¼ m b . We can obtain the matrix elements for this operator through Oðα s Þ from our lattice calculation. R 1 andR 1 are proportional to O 4 and O 5 , respectively, and will be discussed further below. The matrix elements for R 0 in Eq. (23) can be rewritten in terms of our bag parameters using the definition in Eq. (4): Writing it in this way makes clear (setting B ðnÞ B q to 1 and α s to zero) the expected cancellation at leading order to leave matrix elements that are Oð1=m b Þ. Our evaluation of the term in square brackets above yields Note that the uncertainties here include both those for missing α 2 s terms in the matching of lattice QCD operators to continuum operators and also the effect of missing α 2 s terms in the definition of R 0 . This latter uncertainty is estimated by calculating the size of the α s corrections in Eq. (23) and multiplying by α s . This gives a 35% uncertainty, which dominates the error quoted in Eq. (25). To assist with numerical evaluation we have replaced the square ratio of masses in Eq. (24) with 3η q 3 ; values for this can be found in Table I. Equation (25) avoids the use of a perhaps somewhat arbitrary definition of a bag parameter for R 0 given in [75]. The numerical factors show clearly that this is a 1=m b -suppressed operator by being of size Λ QCD =m b ≈ 10%.
R 1 andR 1 are defined as [75] The matrix elements for B s and B d can then be determined from our results in Table V. Our bag parameters for O 4 and O 5 are given in Table VI. In [75] bag parameters for R 1 and R 1 are defined in such a way as to set the squared mass ratios in Eq. (4) to 1. This means that the bag parameters for R 1 andR 1 for the definition in [75] can be recovered from our bag parameters by multiplying by 3η q 4 =7 and 3η q 5 =5, respectively. These factors are larger than 1. Note, however, that the impact of both R 1 andR 1 on ΔΓ is tiny because of the m q =m b factors in their definition.
Matrix elements of the numerically more important R 2 andR 2 operators [75], along with those of R 3 andR 3 , cannot be directly obtained from our current results because they contain derivatives on the light quark fields inside the 4-quark operator. Results of the calculations of these matrix elements are discussed separately in [76].

V. CONCLUSIONS
We give results from the first "second-generation" lattice QCD calculation of the matrix elements that contribute to B s and B d mixing in and beyond the Standard Model. We include c quarks in the sea for the first time and have a range of u=d quark masses (taken to be equal) that go down to the physical value. We use radiatively improved NRQCD for the b quark action. By calculating the ratio of the matrix elements of the 4-quark operators to the square of the decay constant times mass (proportional to a quantity known as the bag parameter) we obtain results with very little dependence on the u=d quark mass or the lattice spacing. This gives us more accurate results than previous calculations for these ratios, and the associated bag parameters, for all five ΔB ¼ 2 operators.
Our key results are given in Tables V and VI. Table V gives the ratio of matrix elements to ðfMÞ 2 with our final physical values given in the last row. These are the numbers that should be used to reconstruct the 4-quark operator matrix elements by multiplying by ðf B q M B q Þ 2 . Table VI converts these ratios into bag parameters, defined in Eq. (4). These numbers can be compared to unity, the result expected in the vacuum saturation approximation. Our error budget for the bag parameters is given in Table VII. We have uncertainties of 4%-7% for the individual bag parameters. This uncertainty is dominated by missing higher orders in the perturbative matching to the continuum 4-quark operators. The uncertainty is reduced to around 2% in the ratio of bag parameters for B s to B d , since this renormalization cancels. The correlations between results for different operators are given in Table X In Sec. IV we discuss the phenomenology from our results. We obtain values for ΔM for B s and B d in Eq. (17) to be compared to experiment.
Alternatively, and more usefully, we can combine our results with experiment to obtain the CKM elements jV ts j and jV td j and their ratio. These values are given in Eq. (18), and Fig. 9 shows the constraints they give in the V td − V ts plane. Our results are the most accurate determinations of these CKM elements using lattice QCD and show good consistency with determinations from tree-level processes assuming CKM unitarity. This means that we see no signs of new physics in neutral B-meson oscillations at this improved level of accuracy.
We derive results, by taking a ratio to ΔM, for the branching fractions for B s and B d to decay to μ þ μ − , a key mode for new physics searches at LHC. Our results are given in Eqs. (20) and (22). Figure 10 shows a comparison of our predictions to those from the recent Fermilab/MILC calculation of decay constants [13] along with the current experimental picture. This is encouraging for future tests of new physics contributions to these rare decay processes.
Finally, we give values in Eq. (25) and below Eq. (26) for the matrix elements for the R 0 , R 1 , andR 1 operators that contribute to the SM prediction for the width difference ΔΓ.
To improve accuracy further in the future requires improving the matching of the lattice QCD 4-quark operators to those in the continuum; this is the dominant source of uncertainty in our bag parameters. Since lattice QCD perturbation theory is so hard, a renormalization method that can be implemented within the lattice calculation and then matched perturbatively to MS in the continuum could be preferable. A symmetric momentum-subtraction scheme (RI-SMOM) has been found to work well for kaon mixing calculations [37] but attention must be paid to removing nonperturbative artifacts in these schemes if high accuracy is to be achieved [77]. Such a method would need to be implemented with a relativistic quark action on lattices with fine enough lattice spacing to allow quark masses close to that of the b for am ≲ 1. This has been a very successful strategy for HISQ quarks for B-meson decay constants [13,40,41], but in that case the decay constants calculated with HISQ do not need any renormalization. Calculating 4-quark operator matrix elements is much harder, but still feasible. The ETM work with twisted mass quarks [7] is encouraging for this program (although they used n f ¼ 2 gluon fields and RI-MOM renormalization) as is the work by RBC/UKQCD using domain-wall quarks [9] (although they have only calculated B s to B d ratios so far). It seems clear that in the next few years improvements in this direction will be possible, pushing uncertainties on bag parameters down to the ∼2% level. This will allow jV ts j and jV td j to be determined to 1%. where s is a 3-vector of meson sources, the sums over spatial x and y project onto zero 3-momentum, the times satisfy 0 < t < T, and β ¼ 1; 2; …; 5 labels the mixing operator (Fig. 11). The sources include a local source, corresponding to

GðtÞ
and two smeared sources: see [2] for details. We also examine the vector of correlators where J ð1Þ is the leading NRQCD correction to J ð0Þ A 0 . We use the corrected current 3 to evaluate the decay constant (in lattice units): where coefficients z A 0 depend on the lattice spacing and were calculated in [2] using [17] (see Table II for the values we use here). The values for α s (from [15]) are given in Table IV. The 2-point and 3-point correlators are calculated in the standard way. For 3-point correlators this involves combining propagators from a local source (at t) into "openmeson" propagators [8] which are then closed off at 0 and T (which cover all time slices away from t) with local or smeared meson and antimeson operators. We average correlators over 16 values of t for improved statistics. The smearing functions we use are given in [15].
Fits for the B s proceed in two steps. First the 2-point correlators GðtÞ and G ð1Þ ðtÞ are fit simultaneously. Then the best-fit amplitudes and energies from that fit are used as priors for a simultaneous fit of all the 3-point correlators G n ðtÞ. Having finished the B s fits, we follow the same approach with the B d correlators but constraining (via the priors) the B d 's fit parameters to be within 20% of the corresponding values for the B s . We discuss all of these fits in what follows.

Fitting two-point correlators
We fit the 2-point correlators to a formula of the form (in lattice units) In the exponents, E n and E o n are the energies of the lowestlying states with zero 3-momentum that couple to the sources. The second (oscillating in time) term in each correlator is due to taste doubling caused by the staggeredquark HISQ action for the light quarks (see [33]). M n and M o n are the physical masses corresponding to states jE n i and jE o n i, respectively. We keep N ¼ 6 terms, but fit results are the same for any N ≥ 5.
We use a Bayesian fit procedure [78]. Fits for the B s mesons on our coarsest lattices (0.15 fm) use the following Bayesian priors for the energies, where ΔE n ≡ E n − E n−1 , and the logarithms indicate lognormal priors for energies. Energies are rescaled in proportion to the lattice spacing to make priors for the other lattices. The priors for local and smeared amplitudes are on the coarsest lattices. Amplitudes for the local source are rescaled by a 3=2 for the other lattices. The smeared sources are designed to be lattice-spacing independent, and so we use the same prior for the other lattices. Finally the priors for parameters j n and j o n are on the coarse lattice, and again scale as a 3=2 for other lattices.
These central values for these priors were based upon fit values for the ground state B s . The uncertainties assigned to the priors are large: for example, the priors are typically 2000-5000 times broader than the final fit errors for the ground state parameters that we need for our analysis. Replacing the central values by random values drawn from the prior distributions leaves our results unchanged within errors.
We need to apply SVD cuts to the data's correlation matrix because of the large number of correlators being fit. The procedure for determining the SVD cuts is described in Appendix D; typically the cuts affect less than half of the data modes. Fits for GðtÞ were constrained to t values between the values for t min and t max shown in Table VIII; data from larger t's was too noisy to be useful. To keep the number of fit points down (see Appendix D), we restricted the fits for G ð1Þ ðtÞ to the range of t's between ðt min þ t max Þ=2 and t max .

Fitting three-point correlators
The fit function for the 3-point correlators is substantially more complicated: where the fit parameters p include all of the 2-point correlator parameters plus the V nm , V o nm , and V oo nm . We fit the 3-point amplitudes over the range t min ≤ t ≤ T − t min for the values of T shown in Table VIII  these are scaled in proportion to a 3 for the other lattices. Note that V n;m ðβÞ and V oo n;m ðβÞ are symmetric under the interchange of n and m. We are interested in the groundstate value for We introduce two simplifications to the analysis that make our fits run 20-100 times faster, without affecting fit results or precision. The first simplification is to replace both the data and the fit function in the fits with their sums over t, while keeping the same fit parameters [79]. This reduces the number of data points to be fit for our 0.09 fm lattice, for example, from 1050 to 180. Note that the Monte Carlo data for G n ðt; TÞ do not vary much with t, as expected from Eq. (A14). The second simplification is to marginalize all fit parameters other than those associated with the ground state [80]. We do this by splitting the fit function [Eq. (A14)] into two parts, one that involves only the ground state (i.e., either the B s or B d ) and the other with the remaining terms: We then replace the fit data G lat β by G lat β ðt; TÞ → G lat β ðt; TÞ − ΔG fit β ðt; T; p prior Þ; ðA19Þ where the prior values for the fit parameters are used in ΔG fit β . At the same time, we replace the fit function by just its ground-state term: This reduces the number of fit parameters from 450 to 9 (for N ¼ 6). Marginalization works particularly well here because we have excellent priors for the amplitudes and energies, from the 2-point correlators, and because the mixing parameters enter the fit function linearly. Also the marginalized fit function is t independent, making the first simplification (summing over t) quite natural. Again we need SVD cuts, but the need is greatly reduced by summing over t. We used the method outlined in Appendix D; typically the cuts modified around 70% of the data modes.
We tabulate simulation results [using Eqs. (A6), (A9), (A10), and (A16)] for the dimensionless ratio hB q jO n jB q i with B q ¼ B s ; B d in Table VIII. This table also shows sample values of χ 2 from the various (2-point and 3-point) correlator fits when we include random SVD and prior noise, as discussed in Appendix D 4. Without noise, the χ 2 's per degree of freedom are much smaller than 1.0, as expected. Also following Appendix D 4, we tested the uncertainties from our fits using simulated data. Fits to simulated data reproduced the input parameters to within errors. We verified that marginalization and averaging over t have negligible effect on our results. With our smallest lattice spacing (0.09 fm), for example, undoing both optimizations gives the following values: These agree well with the values in Table VIII (for set 7), but took far longer to compute.
In Table IX, we show sample error budgets for these quantities from simulations on the 0.09 fm lattice; others are similar. The dominant source of uncertainty is from the Monte Carlo statistics.

Blind analysis
This analysis was blinded by multiplying the 3-point correlators by a random normalization factor. The random factor was removed only after the entire analysis was completed and this paper written.

APPENDIX B: CHIRAL FIT
Although we have results at physical pion masses we do not rely on these simply for our final value. We include also results at heavier-than-physical pion masses, which are statistically more precise, by using a fit to the dependence on the pion mass based on chiral perturbation theory. This gives the coefficients of the nonanalytic "chiral logarithms" in m 2 π logðm 2 π Þ; in addition we include analytic terms to allow both for staggered quark discretization effects, the unphysically heavy u=d quark masses in the sea, and for mistuning of valence quark masses. Performing a fit to results at multiple pion masses then tests the dependence expected from chiral perturbation theory.
Our principal results consist of values for the "reduced" matrix elements of the 4-quark operators, on each configuration set (Table V). We fit the R n q to the form Rðm l ; m s Þ ¼ Rðm phys l ; m phys s Þð1 þ p log χ log þ p J 3g 2 J þ p a 4 δX a 4 þ p m 2 π a 2 δX m 2 π a 2 p l δx l þ p l2 ðδx l Þ 2 þ p v δ v Þ; ðB2Þ where we suppress the indices n and q on each term and each parameter, p, for clarity. The functions used in each term are discussed below. χ log is given in Eq. (B6); J in Eq. (B7); δX in Eq. (B9); δx l in Eq. (B11); and δ v in Eq. (B12). Note that this fit is done after applying the additional uncertainties discussed in Sec. III B to allow for matching and discretization effects.
To derive this form, we make use of the results in [81], where Appendix A gives the dependence on light meson masses of the bag parameters at one loop in heavy meson staggered chiral perturbation theory. This builds on the continuum heavy meson chiral perturbation theory results of [82]. There is a lot of cancellation of chiral logarithms between 4-quark operator matrix elements and decay constants so that, as we discuss below, the remaining chiral logarithm terms [χ log and J in Eq. (B2)] in the bag parameters (and equivalently in R) have small coefficients. This expected very benign dependence on the light quark mass is another reason for working with the bag parameters as we do here, rather than the 4-quark operator matrix elements.
The chiral perturbation theory for the bag parameters is given in [81] in the form (using our notation) Here β n is the low-energy constant (value at zero pion mass) for R n , and β 0 n is the equivalent term for 4-quark matrix elements between vector heavy-light mesons. For O 1 , β 0 1 ¼ β 1 . S comes from "tadpole" diagrams (with þ for n ¼ 1, 2, and 3 and − for n ¼ 4, 5) and Q from "sunset" diagrams that connect pseudoscalar and vector mesons.T andQ are "wrong-spin" tadpole and sunset terms, respectively. Below we discuss the content of these functions in terms of the important nonanalytic chiral logarithms and the effect on these of the discretization effects in the staggered quark formalism. This enables us to transcribe Eq. (B3) into the simpler Eq. (B2) that we will use. We now discuss each of these terms in turn.

Tadpole diagrams
The results in [81] are given in terms of meson masses that include those for pions of different tastes that appear in the staggered quark formalism. Thus a term that would be a simple chiral logarithm in the continuum can appear in a number of guises, one of which is as an average over the masses of all tastes of pion. On fine enough lattices this will become a continuum logarithm plus discretization effects. In fact, for the fully unquenched case that we study here (m val ¼ m sea ), staggered chiral perturbation theory typically arranges itself to cancel taste effects inside chiral logarithms so that nonanalyticities in a 2 cancel as m u=d → 0 (see Appendix A of [83]). This also happens here. The chiral logarithm in R d from tadpole diagrams [S in Eq. (B3)] appears in the form Here I denotes the singlet (largest mass) pion taste. We can compare this function to the corresponding continuum chiral logarithm where P denotes the lightest (Goldstone) pion taste. This comparison is shown in Fig. 12 using taste splittings for HISQ pions for the lattice spacing values that we use in this calculation. We give results for our range of m 2 π ≡ m 2 π;P values from the physical point, 0.018 GeV 2 , to 0.09 GeV 2 . Λ χ ¼ 4πf π ¼ 1.64 GeV, and we take μ χ ¼ 1.0 GeV.
The difference between the HISQ and continuum chiral logarithm terms is sufficiently small that we simply allow for that discrepancy in our treatment of discretization effects. This is included through the δX terms in Eq. (B2) discussed below. We therefore take the chiral logarithm terms in Eq. (B2) with continuum form: Here m 2 η ¼ ð2m 2 η s þ m 2 π Þ=3, and we use masses of Goldstone taste π and η s in this expression. "(phys)" denotes the value of the previous expression evaluated for physical masses so that the total right-hand side vanishes at that point. m 2 η changes very little as m π changes, so these terms in the fit do very little. The parameters p q log;n are given priors þ1.0ð3Þ for n ¼ 1, 2, 3 and −1.0ð3Þ for n ¼ 4, 5 as this logarithm appears with the opposite sign for O 4;5 . The prior width allows for modification of the coefficients from missing higher order terms in chiral perturbation theory.

Sunset diagrams
We now turn to the term denoted J in Eq. (B2) that comes from the sunset diagram term Q in Eq. (B3). This would also take the form of a chiral logarithm m 2 π logðm 2 π =μ 2 χ 2 Þ in the infinite heavy meson mass limit in the continuum. For finite heavy meson mass, however, J is modified by terms that depend on heavy meson mass differences, because there is a pseudoscalar to vector heavy meson transition inside the sunset diagram. The form of J as a function of pion mass and heavy meson mass difference, Δ, is given in Eq. (6.17) of [84], which considers chiral perturbation theory terms for the heavy-light meson decay constant. In that case the appropriate value for Δ can include heavy-strange to heavy-light mass differences as well as vector to pseudoscalar mass differences. Here, when we consider R d and R s , that does not happen and we only have to consider the case where Δ ¼ M B Ã ðsÞ − M B ðsÞ . Then Δ takes the value 45 MeV for B Ã − B [11], and we take the same value for B Ã s − B s since any differences are expected [33] and seen to be [11] small. Figure 13 compares the function Jðm π ; ΔÞ=Λ 2 χ with that of the chiral logarithm to which it is equal when Δ ¼ 0. Even though Δ is small, and much smaller than m π through most of the range in which we work, we see Δ does have an impact, so that J has smaller magnitude and gradient in m 2 π than its associated chiral logarithm.
J in Eq. (B2) then takes the form where Jðm; ΔÞ is given in [84]. J is multiplied by 3g 2 , where g is the BB Ã π coupling. We take the value of g as 0.5, based on recent lattice QCD calculations [85][86][87]. The uncertainty on g, both from the lattice calculations and also from the effect of missing higher order terms in chiral perturbation theory, is absorbed into the coefficient p J . p J is the ratio of low-energy constants associated with the bag parameters for vector heavy-light mesons to that for pseudoscalars. For O 1 we know that this ratio is 1 [88]. For the other operators, n ¼ 2-5, we do not. We therefore take a prior value and width on p ðd;sÞ J;n of 0(1), allowing either sign. For p ðd;sÞ J;1 we take 1.0(3) to allow for uncertainty in g 2 . Note that in the case where Δ ¼ 0 the coefficient of the chiral logarithm ðm 2 π =Λ 2 χ logðm 2 π =μ 2 χ ÞÞ in the chiral perturbation theory for O 1 is ð3g 2 − 1Þ=2 [89], which is small (−0.125) when g ¼ 0.5.

Wrong spin and other effects
The heavy meson staggered chiral perturbation theory analysis of [81] showed that O 1;2;3 and O 4;5 can mix through wrong-spin staggered taste effects [T andQ in Eq. (B3)]. The size of these terms depends on the size of the light meson taste splittings for the staggered action. For the asqtad action used for light quarks in [8] they were of some concern, and it was important to include these effects explicitly in a full fit to all five operators. For the HISQ action that we use here these effects are much smaller. The question then becomes whether they are distinct in magnitude or form from discretization effects from other sources that are already included in our analysis.
The wrong-spin contributions from tadpole diagrams for B d involve differences of chiral logarithms for different taste pions (hence canceling in the absence of taste-splitting effects) along with hairpin terms that have coefficients a 2 δ 0 V and a 2 δ 0 A that are themselves the size of the unit of taste splitting [83]. As an illustration of the impact of these terms we examine the terms that are differences of chiral logarithms. These appear in three "signatures": V − A, P − I, and −P þ 2T − I. Here the letter denotes the pion taste, ordered in increasing mass as P, A, T, V, I. Figure 14 illustrates the size and behavior of these terms for the −P þ 2T − I example that mixes O 2 and O 3 , the simplest because there are no additional hairpin corrections. The function plotted is We see thatX falls rapidly with lattice spacing [approximately as ðaΛÞ 4 ] and has a slope with m 2 π that also falls with lattice spacing [approximately as ðaΛÞ 2 ]. This behavior is generic for terms that arise from taste splittings in this way.
The wrong-spin tadpole terms have a variety of coefficients multiplying them that correspond to ratios of 4-quark operator matrix elements (within the groupings 1,2,3 and 4,5). Most of the coefficients for the wrong-spin tadpole terms appearing in the chiral expansion for O y are of the form β x =ð4β y Þ where β x is the low energy constant associated with operator O x . The exception is O 1 , where the coefficient is 2ðβ 2 þ β 3 Þ=β 1 . If all the 4-quark operator matrix elements were of the same size, then the coefficients would be Oð1=4Þ. O 3 and O 5 have smaller matrix elements than the others, however, if we consider the vacuum saturation approximation. Hence β x =β y can be of size 2 for O 5 and 5 for O 3 .
We are already including a 2 and a 4 errors in Eq. (10), but contributions from wrong-sign tadpole terms differ between B d and B s mesons. The largest contributions are for the B d meson; we allow for them and similar terms that arise from sunset diagrams [and so contain differences of Jðm π;t ; ΔÞ] along with residual right-sign taste effects by including terms in the chiral fit, Eq. (B2). We include them with the chiral fit rather than with other discretization effects in Eq. (10) because they arise from the staggered quark action and hence carry no am b dependence. Here allows for m 2 π dependence; it varies between −0.5 and 0.5 over our range of parameters. Similar terms are not needed for B s mesons because the effects are smaller and can be simulated by Eq. (10). We take the priors p a 4 and p m 2 π a 2 to be 0(2) since, although this is an unnecessarily broad prior for some n, it allows a reasonable size for all the possibilities.
FIG. 14. The functionX, defined in Eq. (B8), plotted for HISQ light quarks against the square of the pion mass. We give three curves, for lattice spacing values corresponding to our very coarse, coarse, and fine lattices. This function appears in the "wrong-sign" tadpole terms in heavy meson staggered chiral perturbation theory.

Analytic terms
The three terms given symbol δ on the last line of Eq. (B2) are simple polynomials to account for mistuning of sea and valence quark masses from their physical values. We use where m l and m s are the sea u=d and s quark masses from Table III. The physical value for the m l =m s ratio we take as 27.18(10) from [13]. The factor of 1=10 converts m l =m s to the size of the parameters that appear in chiral perturbation theory as a ratio of meson masses to Λ χ ¼ 4πf π . Equation (B2) includes terms in δx l and ðδx l Þ 2 . We take a prior of size 0(1) for each coefficient p l and p l2 (for each operator and each q). We do not allow for mistuning effects for c quarks in the sea since we expect these effects to be negligible compared to those from light sea quarks. δ v accounts for the mistuning of light and strange valence masses appropriate to B d or B s . We take Again the coefficients for this term, p q v;n , have prior 0(1) for each n and q. We do not include terms for b quark mass mistuning since the tuning is very accurate and we expect any small mistuning to have a negligible effect on bag parameters.

Finite-volume, strong isospin-breaking, and QED effects
Finite-volume effects can be estimated based on chiral perturbation theory. The results in [90] show finite-volume effects of Oð1%Þ for the bag parameters of O 1 in small lattice volumes of size L ¼ 2.5 fm for physical u=d quark masses. On our much larger lattices, with a minimum size of L ¼ 4.6 fm at physical u=d masses for set 3, finitevolume effects will be a lot smaller. We conclude that this is a negligible effect at our current level of uncertainties.
Strong-isospin breaking and electromagnetic effects can also be estimated to be negligible at present. Our bag parameters show very little sensitivity to the u=d quark mass and our ratios for B s to B d differ from 1 by at most 10% (Table VI). This suggests that changing m l to m d should only have a Oð0.1%Þ effect. Effects from the fact that the valence quarks have electromagnetic charge are estimated at below 0.1% for the decay constants in [13]. They come largely from QED effects on the tuning of quark masses. Since bag parameters are less sensitive to both heavy and light quark masses than decay constants, we conclude that QED effects on the bag parameters will be less than 0.1%, and we neglect them. Note that QED effects can still enter ΔM q or BrðB q → μ þ μ − Þ through corrections to these processes from adding photons; these effects need to be considered separately.

APPENDIX C: CORRELATIONS IN FINAL RESULTS
In this appendix we describe the correlations between the uncertainties in different final results from our analysis. Our principal results consist of values for the reduced matrix elements of the 4-quark operators, R n q , defined in Eq. (B1), evaluated for physical quark masses (Table V). The results for a given meson and different operators are only weakly correlated, as shown in Table X for the B s meson. There is more correlation, but still small, between values of the bag parameters [Eq. (2)], because of the normalization factors η s n [Eq. (4)]. The means and standard deviations for these quantities are collected in Table XI.
Values of R n s are highly correlated with values of R n d , for the same n, which is why the ratios R n s =R n d have much smaller uncertainties. Uncertainties in these ratios are almost uncorrelated, however, with those in the R n q (correlations are 0.06 or smaller). Thus correlations for the B d matrix elements R n d can easily be constructed from the results in Table XI and Table X   There are three inputs for our least-square fits to sets of correlators: (1) a collection of N s random (Monte Carlo) samples G ðsÞ , where each sample is packaged as an N Gdimensional vector; (2) a (vector) fitting function GðpÞ of fit parameters p; and (3) a priori estimates (priors) for the fit parameters.
In the fit, the sample averagē is assumed to be a random sample drawn from a Gaussian distribution with mean Gðp Ã Þ for some set p Ã of fit parameters, and a covariance matrix given approximately by Here D is the diagonal matrix of standard deviations, D ij ¼ δ ij σ G i , and M corr is the correlation matrix. The bestfit parameters are obtained by minimizing as a function of the parameters p, where λ n and v n are the eigenvalues and eigenvectors of the correlation matrix: Note that χ 2 prior ðpÞ is the part of χ 2 ðpÞ associated with the Bayesian priors used in the fit.
The approximation for the covariance matrix, Eq. (D2), causes problems if the number of samples N s is insufficiently large compared with the number of data points N G [91,92]. In particular, the smaller eigenvalues of the correlation matrix are underestimated. Indeed, it is obvious from Eq. (D2) that there must be N G − N s modes with zero eigenvalue when N s < N G . Underestimating eigenvalues exaggerates their importance in χ 2 ðpÞ [Eq. (D4)], compromising the fit; and χ 2 ðpÞ is undefined if there are zero eigenvalues.
The underestimation of small eigenvalues is illustrated in Fig. 15, which shows the ratio of λ approx n =λ exact n for approximate eigenvalues estimated from random samples of different sizes drawn from a simulation of a known distribution (so we know the exact eigenvalues). The small eigenvalues are dragged down to zero by the need for zero modes when N s < N G . They then increase slowly as new samples are added, until λ approx n =λ exact n ≈ 1 for all n when N s ≫ N G . Note that good approximations for all eigenvalues require N s to be 10-100 times larger than N G . The figure shows results for N G ¼ 512 pieces of correlated data; curves for N G ¼ 50 (or 5000) would be similar, but with more (or less) noise. The range of values covered by the eigenvalues also has little effect on the overall picture.

Choosing an SVD cut
The problematic eigenvalues are those for which λ approx n λ exact since, on average, individual terms in χ 2 ðpÞ should contribute approximately 1 AE ffiffiffiffiffiffiffiffiffiffiffi ffi 2=N G p to the total (based on the width of the χ 2 distribution). Following [91,92], we deal with these eigenvalues by introducing a cutoff κ such that eigenvalues smaller than κλ max are replaced with κλ max , where λ max is the largest eigenvalue: λ n → maxðλ n ; κλ max Þ: ðD8Þ Tuning κ appropriately, this replacement increases the underestimated eigenvalues to a value that is at least as large as the exact eigenvalue (and probably a lot larger). Unlike in [91,92], we do not renormalize the eigenvalues to preserve the trace of the modified matrix (see below).
We need curves such as those in Fig. 15 to set κ, but we do not know the exact eigenvalues in real applications. An approximate curve can be generated by comparing the eigenvalues of bootstrapped copies of our simulation results fG ðsÞ g with the eigenvalues from the full ensemble. Each bootstrapped copy has N s samples, as the original ensemble. In this analysis, bootstrapped eigenvalues play the role of the approximate eigenvalues above, while eigenvalues computed directly from ensemble fG ðsÞ g now play the role of the exact eigenvalues (since they specify the underlying distribution for the bootstrapped copies).
Ratios of these eigenvalues are plotted in Fig. 16 (blue points) for examples with N G ¼ 50 (top panel) and N G ¼ 500 (bottom panel) data points, and N s ¼ 4N G random samples for each data point. The error bars show the spread of values across the different bootstrapped copies. These points give us an approximate curve for λ n =λ exact n , from which we can determine an SVD cutoff.
The ensembles used in these examples were generated from a known distribution, so in this case we know the correct curve for λ approx n =λ exact n -that is, the ratio of the eigenvalues from the original ensemble to the exact eigenvalues from the underlying distribution. The (solid) red line in the plots shows this curve; it agrees well with the bootstrap estimates. which shows that the SVD cut is essential since χ 2 =N G ¼ 1.30 is much too large for N G ¼ 500-it corresponds to a p value of order 3 × 10 −5 .

Conceptual framework
The nature of the SVD modification can be understood by representing the ensemble-averaged data as a vector of Gaussian random variables, where δG ≡ X N G n¼1 z n ffiffiffiffi ffi λ n p Dv n ; Here δG represents the uncertainty associated with the ensemble average, The effect of the SVD cut is to add more uncertainty, δG SVD , where δG SVD ≡ X λ n <κλ maxz n ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi κλ max − λ n p Dv n ; ðD16Þ andz n are new random variables with zero mean and unit covariance matrix. Then is the SVD-modified covariance data. The SVD noise discussed above is a random sample drawn from the distribution described by δG SVD . These formulas underscore the fact that introducing an SVD cut is a conservative move: it always increases the uncertainties in the data. This would not necessarily be the case if we renormalized the eigenvalues after introducing the SVD cut, as is done in [91,92]. In practice, however, the difference between the two approaches is small.
Finally note that another option in an SVD analyses is to discard modes below the cutoff. This corresponds to setting λ n ¼ ∞ for these modes, which is much larger than is reasonable, much too conservative. We find that fits are more accurate and more stable using the prescription outlined above.

Goodness of fit
Note that χ 2 =N G ¼ 0.41 in Eq. (D10) is much smaller than expected for N G ¼ 500: one expects 1.00 (6) instead. The small value arises because random fluctuations inḠ are characteristic of the uncertainties in δG, but not those in δG SVD . We can demonstrate this by adding a random sample toḠ drawn from the distribution specified by δG SVD , and refitting. In the case of Eq. (D10), a typical fit with SVD noise gives χ 2 =N G increases to 0.96, which is consistent with expectations. Parenthetically, we note that overly broad priors-for example, 0 AE 10 for a set of parameters that are all order 1-can also result in a small χ 2 . This situation is addressed in a similar fashion, by replacing the prior distribution P: P ≡P þ δP →P þ sampleðδPÞ þ δP: ðD19Þ A good fit should have χ 2 =N G ≈ 1 AE ffiffiffiffiffiffiffiffiffiffiffi ffi 2=N G p when both SVD and prior noise is included, and the fit results should agree (within errors) with the results without noise.
A more direct test of a fitting protocol than adding extra noise is to replace the fit data [Eq. (D11)] with simulated data, G sim ≡ Gðp sim Þ þ sampleðδGÞ þ δG; ðD20Þ which has the same covariance matrix (from δG) as the real data, but whose mean is a random sample drawn from a distribution whose mean is known [Gðp sim Þ]. A good fit to this simulated data should give best-fit results for the parameters that agree with p sim to within errors. An obvious choice for the simulation parameters p sim are the best-fit results obtained when fitting the real data.