Observation of the annihilation decay mode $B^{0}\to K^{+}K^{-}$

A search for the $B^{0}\to K^{+}K^{-}$ decay is performed using $pp$-collision data collected by LHCb. The data set corresponds to integrated luminosities of 1 and 2 fb$^{-1}$ at center-of-mass energies of 7 and 8 TeV, respectively. This decay is observed for the first time, with a significance of more than five standard deviations. The analysis also results in an improved measurement of the branching fraction for the $B_s^0\to \pi^+\pi^-$ decay. The measured branching fractions are $BR(B^0\to K^+K^-) = (7.80 \pm 1.27 \pm 0.81 \pm 0.21) \times 10^{-8}$ and $BR(B_s^0\to\pi^+\pi^-) = (6.91 \pm 0.54 \pm 0.63 \pm 0.19 \pm 0.40) \times 10^{-7}$. The first uncertainty is statistical, the second is systematic, the third is due to the uncertainty on the $B^0\to K^+\pi^-$ branching fraction used as a normalization. For the $B_s^0$ mode, the fourth accounts for the uncertainty on the ratio of the probabilities for $b$ quarks to hadronize into $B_s^0$ and $B^0$ mesons.

The understanding of the dynamics governing the decays of heavy-flavored hadrons is a fundamental ingredient in the search for new particles and new interactions beyond those included in the Standard Model of particle physics (SM). The comparison of theoretical predictions and experimental measurements enables the validity of the SM to be tested up to energy scales well beyond those directly accessible by current particle accelerators. In the last two decades, the development of effective theories significantly improved the accuracy of theoretical predictions for the partial widths of such decays. Several approaches are used to deal with the complexity of quantum chromodynamics (QCD) computations, like QCD factorization (QCDF) [1][2][3], perturbative QCD (pQCD) [4,5] and soft collinear effective theory (SCET) [6]. Despite the general progress in the field, calculations of decay amplitudes governed by so-called weak annihilation transitions are still affected by large uncertainties. In the SM, the rare decay modes B 0 → K + K − and B 0 s → π + π − (charge conjugate modes are implied throughout) can proceed only through such transitions, whose contributions are expected to be small but could be enhanced through certain rescattering effects [7]. The corresponding Feynman graphs are shown in Fig. 1. Precise knowledge of the branching fractions of these decays is thus needed to improve our understanding of QCD dynamics in the more general sector of two-body b-hadron decays. The B 0 → K + K − and B 0 s → π + π − decays play also a role in techniques proposed to measure the angle γ of the unitary triangle [8].
While the B 0 s → π + π − decay has already been observed [9], no evidence exists for the B 0 → K + K − decay to date, despite searches performed by the BaBar [10], CDF [11], Belle [12] and LHCb [9] collaborations. Averages of the measurements of the branching fractions of these two decays are given by the Heavy Flavor Averaging Group (HFAG): − 0.05 ) × 10 −6 (corresponding to an upper limit of 0.23 × 10 −6 at 95% confidence level) and B(B 0 s → π + π − ) = (0.76 ± 0.13) × 10 −6 [13]. The results of a new search for the B 0 → K + K − decay and an update of the branching fraction measurement of the B 0 s → π + π − decay are presented in this Letter. The data sample that is analyzed corresponds to integrated luminosities of 1.0 fb −1 at √ s = 7 TeV and 2.0 fb −1 at √ s = 8 TeV of pp collision data collected with the LHCb detector in 2011 and 2012, respectively.
The LHCb detector [14,15] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5. The tracking system consists of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of siliconstrip detectors and straw drift tubes placed downstream of the magnet. The particle identification (PID) system consists of two ring-imaging Cherenkov (RICH) detectors, scintillating-pad and preshower detectors, electromagnetic and hadronic calorimeters, and a set of multiwire proportional chambers alternated with iron absorbers.
Simulated events are used in various steps of the analysis. In the simulation, pp collisions are generated using Pythia [16,17] with a specific LHCb configuration [18]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [19], as described in Ref. [20].
The online event selection is performed by a trigger [21], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction and requires a secondary vertex (SV) with a significant displacement from all primary pp interaction vertices (PVs). At least one charged particle must have high transverse momentum, p T , and large χ 2 IP with respect to all PVs, where χ 2 IP is the difference between the χ 2 of the PV fit performed with and without the considered particle. An algorithm based on a boosted decision tree (BDT) multivariate classifier [22,23] is used for the identification of secondary vertices consistent with the decays of b hadrons [24]. To further increase the trigger efficiency, an exclusive selection algorithm for two-body b-hadron decays was put in place, imposing requirements on the following quantities: the quality of the reconstructed tracks, their p T and impact parameter (IP), the distance of closest approach between the two oppositely charged tracks used to reconstruct the b-hadron candidate, and the p T , IP and proper decay time of the b-hadron candidate.
The event selection is refined offline using another BDT classifier and requirements on PID variables. The BDT returns a discriminant variable which is used to classify each b-hadron candidate as either signal or background. With the exception of the b-hadron decay time, the input variables to the BDT classifier are those used in the software trigger, plus the following: the largest p T and IP of the b-hadron decay products; the χ 2 IP of the b-hadron candidate; the χ 2 of the SV fit; and information on the separation of the SV from the PV. In the presence of multiple PVs per event (up to six and with an average of about two in this analysis), the one with the smallest χ 2 IP of the b-hadron candidate is considered.
The PID system is used to separate the data into mutually exclusive subsamples corresponding to various hypotheses for the final state, namely K + π − , pK − , pπ − , as well as π + π − and K + K − . The calibration of the PID variables is necessary to determine the yields of other two-body b-hadron decays where one or two particles in the final state are misidentified (cross-feed backgrounds). The efficiencies for a given PID requirement are determined using samples of kaons and pions from the D * + → D 0 (→ K − π + )π + decay chain and protons from Λ → pπ − and Λ + c → pK − π + decays. Since the RICH-based PID information depends on particle momentum, pseudorapidity and track multiplicity, the efficiencies are determined in bins of these variables. They are then averaged over the momentum and pseudorapidity distributions of the final state particles of two-body b-hadron decays, and over the distribution of track multiplicity in the corresponding events. Uncertainties on the PID efficiencies are due to the finite sizes of the calibration samples and to the binning used to calculate the efficiencies. The size of the latter uncertainty is estimated by the maximum variation when repeating the PID calibration procedure using different binning schemes.
The final selection criteria on the BDT output and PID variables are separately optimized for the B 0 → K + K − and B 0 s → π + π − decays. The outcome of the optimization consists of two event selections, S K + K − and S π + π − , aiming at the best sensitivity on the B 0 → K + K − and B 0 s → π + π − signal yields, respectively. In the two selections, common PID requirements are applied to define the subsamples with final-state mass hypotheses other than K + K − and π + π − . The optimization procedure is based on pseudoexperiments generating K + K − and π + π − invariant mass distributions. Fits to these distributions are performed with a model identical to that used for the generation. The B 0 (s) → K + K − and B 0 (s) → π + π − components are each described by a sum of two Gaussian functions with a common mean to account for mass resolution effects, with parameters determined from data, convolved with a power-law distribution that accounts for final state radiation (FSR) effects. In particular, the B 0 s → K + K − mass shape is deformed due to FSR in the region where the B 0 → K + K − signal is expected. The power-law distribution is derived from analytical quantum electrodynamics (QED) calculations [25], and the correctness of the model is checked against simulated events generated by Photos [26]. Photos simulates QED-photon emissions in decays by calculating O(α) radiative corrections for charged particles using a leading-log collinear approximation. Within the approximation, the program calculates the amount of bremsstrahlung in the decay and modifies the final state according to the decay topology. The mass distributions of simulated B candidates, generated with Photos, are well described by fits performed using the mass model developed in this analysis. The fit results are in excellent agreement with the theoretical values of the FSR parameters calculated according to Ref. [25] for each of the decay modes under study.
The background due to the random association of two oppositely charged tracks (combinatorial background) is modeled with an exponential function. The backgrounds due to the partial reconstruction of multibody b-hadron decays are parameterized by means of ARGUS functions [27] convolved with the same resolution function used for the signals. In the case of partially reconstructed B → K + π − X decays, where X stands for one or more missing particles, and the pion is misidentified as a kaon, an incorrect description may alter the determination of the B 0 → K + K − signal yield. Hence the shape of the mass distribution and the size of this contribution to the K + K − mass spectrum are determined from data by studying a sample of events selected with tight K + π − PID requirements, and accounting for the known effects of different PID selection criteria on the invariant mass resolution. The shapes of the mass distributions for cross-feed backgrounds are determined by means of a kernel estimation method [28] applied to the invariant mass distributions of simulated two-body b-hadron decays. As the B 0 → K + π − cross-feed background contributes to the K + K − mass distribution in the B 0 → K + K − signal mass region, the resulting shape of the mass spectrum is validated with data using again a sample of events selected with tight K + π − PID requirements. The amounts of cross-feed backgrounds are determined relative to the yields of the B 0 s → K + K − and B 0 → π + π − decays, scaled by the branching fractions, PID efficiencies and b-quark hadronization probabilities to form B 0 or B 0 s mesons [29]. For a given set of BDT and PID selection requirements, pseudoexperiments are generated with yields and model parameters of the backgrounds as determined from data. Signal decays are injected into simulated mass distributions according to different hypotheses for the values of their branching fractions. For each pseudoexperiment, the significance of the signal under study is computed according to Wilks' theorem [30] as 2 ln (L S+B /L B ), where L S+B and L B are the likelihoods of the nominal fit and of a fit where the yield of the signal is fixed to zero, respectively. As the B 0 → K + K − decay is still not observed and its branching fraction not well constrained, a multidimensional scan is performed over a wide range of branching fraction values, as well as BDT and PID selection requirements. For each point of the scan the signal significance is determined. The point corresponding to the smallest branching fraction that can be measured with a significance of 5 standard deviations is determined, and the optimal selection requirements are thus identified. This branching fraction is found to be B min 6×10 −8 . In contrast, the expected yield of B 0 s → π + π − decays is more precisely constrained, and the optimization of the selection requirements is found not to depend on the assumed branching fractions within ±2 standard deviations from the current world average value [13]. The optimization procedure for S K + K − leads to tighter PID and looser BDT requirements with respect to S π + π − . This is due to the fact that the random association of two kaons is much less likely than that of two pions, and thus the correct identification of two kaons provides a more powerful rejection of the combinatorial background with respect to that of two pions. As a consequence, the combinatorial background in the π + π − spectrum is best suppressed by the application of tighter requirements on the BDT output.
After applying the BDT and PID criteria for S K + K − or S π + π − , the signal yields are determined by means of an extended binned maximum likelihood fit done simultaneously with the exclusive data sets defined by the different mass hypotheses of particles in the final state. The model fitted to the mass distributions is the same as that used in the optimization of the selection. The amount of each cross-feed background contribution is determined directly from the fits, taking into account the appropriate PID efficiency factors. The m K + K − and m π + π − invariant mass distributions are shown in Fig. 2, with the results of the best fits superimposed. The yields for the two signals are N (B 0 → K + K − ) = 201 ± 33 ± 14 and N (B 0 s → π + π − ) = 455 ± 35 ± 24, where the first uncertainty is statistical and the second is systematic. The systematic uncertainties are related to the choice of the model used to parameterize the invariant mass shapes of signal and background components and to the knowledge of the PID efficiencies used to determine the amount of cross-feed backgrounds. The results of the best fits are used to generate pseudoexperiments, and then fits with alternative models are applied to the mass distributions. By studying the distributions of the difference between the signal yields determined from the nominal fit and those performed with alternative models, systematic uncertainties are determined. Such alternative models are considered for signal, combinatorial background, background from partially reconstructed b-hadron decays and cross-feed background mass models. The systematic uncertainty due to PID efficiencies is also assessed by generating pseudoexperiments and fitting the nominal model to the output mass distributions, using PID efficiencies randomly varied in each pseudoexperiment according to their estimated uncertainties. The standard deviation of the distribution of the yields determined in each set of pseudoexperiments is taken as a systematic uncertainty. The contributions of the various systematic uncertainties are reported in Table 1. The systematic uncertainties associated to the knowledge of the cross-feed background mass shapes are found to be negligible and are not reported. The total systematic uncertainties are obtained by summing all contributions in quadrature.
The significance of the B 0 → K + K − signal with respect to the null hypothesis is determined by means of a profile likelihood ratio. To account for systematic uncertainties, the likelihood function is convolved with a Gaussian function with width equal to the systematic uncertainty. The log-likelihood ratio as a function of the B 0 → K + K − signal yield is shown in Fig. 3. The statistical significance is found to be 6.3 standard deviations,  Figure 2: Distributions of (left) m K + K − and (right) m π + π − for candidates passing S K + K − and S π + π − , respectively. The continuous (blue) curves represent the results of the best fits to the data points. The most relevant contributions to the invariant mass spectra are shown as indicated in the legends. The vertical scales are chosen to magnify the relevant signal regions. The bin-by-bin differences between the fits and the data, in units of standard deviations, are also shown.
reduced to 5.8 when considering systematic uncertainties. The branching fractions of B 0 → K + K − and B 0 s → π + π − decays are determined relative to the B 0 → K + π − branching fraction, according to the following equation where f x is the probability for a b quark to hadronize into a B 0 x meson (x = d, s), N and ε are the yield and the efficiency for the given decay mode, respectively, and h stands for K or π. The yields of the B 0 → K + π − decay in the sub-samples selected with K + π − PID requirements are determined from the fits, and their values are N (B 0 → K + π − ) = 105 010 ± 431 ± 988 and N (B 0 → K + π − ) = 71 304 ± 312 ± 609, when applying the BDT requirements of S K + K − and S π + π − , respectively. Trigger and reconstruction efficiencies Table 1: Systematic uncertainties on the yields for the B 0 → K + K − and B 0 s → π + π − decays.  are determined from simulation and corrected using information from data. For the B 0 s → π + π − decay the sizeable value of the decay width difference between the long-and short-lived components of the B 0 s -meson system is taken into account. The B 0 s → π + π − lifetime is assumed to be that of the short-lived component, as expected in presence of small CP violation. The final ratios of efficiencies are found to be 2.08 ± 0.16 and 1.43 ± 0.10 for the B 0 → K + K − and B 0 s → π + π − decays, respectively. The dominant contributions to the uncertainties on these ratios are due to the PID calibration and to the knowledge of the trigger efficiencies. The following results are then obtained

Systematic uncertainty
where the first uncertainty is statistical and the second systematic. Using the HFAG average B(B 0 → K + π − ) = (19.57 +0.53 −0.52 ) × 10 −6 [13], and f s /f d = 0.259 ± 0.015 from Ref. [29], the following branching fractions are obtained where the first uncertainty is statistical, the second systematic, and the third and fourth are due to the knowledge of B(B 0 → K + π − ) and of f s /f d , respectively. Various theoretical predictions of the branching fractions of B 0 → K + K − and B 0 s → π + π − decays are available in the literature [2-5, 7, 31-35]. The pQCD estimations in Ref. [5] are in agreement within uncertainties with the present results. The QCDF prediction of B(B 0 → K + K − ) in Ref. [2] agrees well with these results, but that of B(B 0 s → π + π − ) is significantly smaller than the measurement. In Ref. [34], the unexpectedly large value of B(B 0 s → π + π − ) caused the traditional QCDF treatment for annihilation parameters to be revisited.
In summary, this Letter reports the most precise measurements of the branching fractions for the B 0 → K + K − and B 0 s → π + π − decay modes to date. These are in good agreement with and supersede those reported in Ref. [9], which were the best results available prior to the present analysis. The B 0 → K + K − decay is the rarest fully hadronic B-meson decay ever observed.