Observation of the decay Bc+ ->Bs0 pi+

The result of a search for the decay Bc+ ->Bs0 pi+ is presented, using the Bs0 ->Ds- pi+ and Bs0 ->J/psi Phi channels. The analysis is based on a data sample of pp collisions collected with the LHCb detector, corresponding to an integrated luminosity of 1 fb-1 taken at a center-of-mass energy of 7 TeV, and 2 fb-1 taken at 8 TeV. The decay Bc+ ->Bs0 pi+ is observed with significance in excess of five standard deviations independently in both decay channels. The measured product of the ratio of cross-sections and branching fraction is sigma(Bc+)/sigma(Bs0) x BR(Bc+ ->Bs0 pi+) = (2.37 \pm 0.31 (stat) \pm 0.11 (syst)^{+0.17}_{-0.13} (tau_Bc)) x 10^-3 in the pseudorapidity range 2<eta(B)<5, where the first uncertainty is statistical, the second is systematic and the third is due to the uncertainty on the Bc lifetime. This is the first observation of a B meson decaying to another B meson via the weak interaction.

The B + c meson is the ground state of thebc system. As such it is unique as it is the only weakly decaying doubly heavy meson. All measurements of B + c meson decays to date are decays where the constituent b quark decays weakly to a c quark [1][2][3][4][5][6][7][8]. The decay of the B + c meson to another B meson, with the bottom quark acting as a spectator (see Fig. 1), has not previously been observed. This will improve the understanding of theoretical predictions, and provide valuable information for the source of B 0 s mesons at the LHC.
A wide range of predictions for the branching fraction B(B + c → B 0 s π + ) exists, between 16.4 % and 2.5 %, based on e.g. QCD sum rules [9,10], or quark-potential models (see Refs. [11][12][13][14][15][16] and references therein). Experimental clarification is needed to shed light on the present theoretical status. Unlike most other B decays, the higher order corrections in the expansion of Heavy Quark Effective Theory within the framework of quantum chromodynamics (QCD) are relatively large. The expansion is described in powers of m c /m b rather than Λ QCD /m b , due to the presence of two heavy quark constituents, where Λ QCD is the QCD scale, and m c (m b ) the charm (bottom) quark mass. In addition, the energy release in the decay is relatively small, leading to larger non-factorizable effects compared to decays with lighter daughter particles. Study of the decay B + c → B 0 s π + allows these models to be tested. Knowledge of the production of B 0 s mesons from B + c decays is also useful for time-dependent analyses of B 0 s decays, to understand any associated decay-time bias due to the incorrect estimate of the B 0 s decay time if originating from a B + c decay, or to take advantage of flavor tagging capabilities using the accompanying ("bachelor") pion.
The data used in this analysis were collected with the LHCb detector [17] from pp collisions at √ s = 7 TeV and 8 TeV, corresponding to integrated luminosities of 1 fb −1 and 2 fb −1 , respectively.
The decays B 0 s → D − s π + and B 0 s → J/ψ φ are used, with the subsequent decays D − s → K + K − π − , J/ψ → µ + µ − and φ → K + K − . The inclusion of charge conjugate modes is implied throughout. The event selection and fits to the B 0 s invariant mass distributions follow previous LHCb analyses based on these B 0 s decay modes [18,19]. The two channels are analysed independently and the final results are combined. The strategy is to normalize the final number of B + c → B 0 s π + decays to the number of B 0 s decays, which gives a result for the B + c → B 0 s π + branching fraction multiplied by the ratio of B + c and B 0 s production rates, (σ(B + c )/σ(B 0 s )) × B(B + c → B 0 s π + ). The B + c signal region was not examined until the event selection was finalized. Since the ratio of production rates, σ(B + c )/σ(B 0 s ), may depend on the kinematics of the produced B meson, the result is quoted for B mesons produced in the pseudorapidity range 2 < η(B) < 5, corresponding to the LHCb detector acceptance.
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Ref. [17]. The combined tracking system provides momentum measurement with relative uncertainty that varies from 0.4 % at 5 GeV/c to 0.6 % at 100 GeV/c, and impact parameter resolution of 20 µm for tracks with high transverse momentum, p T . The impact parameter (IP) is defined as the distance of closest approach between the track and a primary interaction. Charged hadrons are identified using two ring-imaging Cherenkov detectors. The charged pions from B + c decays are selected with efficiency of 93 % while keeping the misidentification rate of kaons below 7 %. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers with a typical efficiency of 97 % at 1-3 % pion to muon misidentification probability. The trigger [20] consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. The B 0 s candidates with muons in the final state are required to pass the hardware trigger, which selects muons with a transverse momentum, p T > 1.48 GeV/c, whereas the B 0 s candidates with only hadrons in the final state are selected by requiring a hadron in the calorimeter with E T > 3.6 GeV/c.
Monte Carlo simulations, used to develop the B + c candidate selection, are performed using Bcvegpy [21], interfaced with Pythia 6.4 [22] using a specific LHCb configuration [23]. Decays of hadronic particles are described by EvtGen [24], in which final state radiation is generated using Photos [25]. The interaction of the generated particles with the detector and its response are implemented using the Geant4 toolkit [26] as described in Ref. [27].
The B 0 s candidates are selected using the multi-variate analysis known as boosted decision tree (BDT) [28,29], to optimally discriminate between signal and background. In the training, simulated B 0 s decays are used as signal, whereas candidates in the B 0 s mass sideband in data are used as background. To avoid potential biases, only one sixth of the data is used in the training. It is verified that the distribution of the BDT discriminant is the same for the events used in the training, compared to those that were not. All events are used for the final result. The BDT training for the selection of LHCb The total number of B 0 s decays is obtained from extended unbinned maximum likelihood fits to the invariant mass distributions, using mass constraints for the J/ψ candidates [30], and are shown in Fig. 2. The signal shapes are taken as double Crystal Ball functions [31] with common peak value and with tails to either side of the peak, to account for final state radiation and detector resolution effects. The parameters that describe the tails are obtained from simulation and are fixed in the fits. The peak and width parameters of the signal are allowed to vary. The combinatorial backgrounds are modeled with exponential distributions. The B 0 s → D − s π + final state is contaminated by partially reconstructed B decays such as B 0 s → D * − s π + and B 0 s → D − s ρ + decays, where the soft photon or neutral pion is not reconstructed, and by decays where one of the final state particles is misidentified as a kaon, such as B 0 → D − π + orΛ 0 b → Λ − c π + decays. The shapes of these backgrounds are fixed from simulation, following Ref. [18]. In total 103 760 ± 380 B 0 s → J/ψ φ and 73 700 ± 500 B 0 s → D − s π + decays are found. Selected B 0 s candidates with masses consistent with the known B 0 s mass are combined with tracks that satisfy loose pion identification requirements. Subsequently, B + c candidates are selected with a second BDT algorithm. In the training of the second BDT, simulated candidates with masses consistent with the B + c mass [32] are used as signal, and candidates in the B + c mass sideband region in data are used as background. For this, only the upper mass sideband is used in the case of B 0 s → D − s π + , while also the lower mass sideband is used in the case of B 0 s → J/ψ φ, to further suppress the larger combinatorial background at smaller values of the mass. Only one sixth of the total data set is used in the training. The second BDT uses the following variables: the B + c candidate p T , decay time, χ 2 vtx , χ 2 IP and the B + c pointing angle, i.e. the angle between the B + c candidate momentum vector and the line joining the associated PV and the B + c decay vertex. The B 0 s polar angle (the angle between B 0 s flight direction and the beam axis), decay time, decay length and pointing angle are also used. The p and p T of the bachelor pion from the B + c decay are the most discriminating observables in the second BDT. Differences between the analyses of the D − s π + and J/ψ φ final states are: the use of χ 2 IP of the B 0 s candidate and bachelor pion (from the B + c decay), and B 0 s and B + c momentum for the former; the use of the B + c and B 0 s decay-length uncertainties for the latter. The optimal selections are defined by maximizing figures of merit for a target level of significance of three standard deviations, /(3/2 + √ B) [33], where is the signal efficiency for a given BDT criterion. The figure of merit displays a plateau, and the chosen value is at the lower end to allow to better constrain the shape of the combinatiorial background. The chosen selection is very close to the optimal point for a target level of 5 σ and for the expected significance S/ √ S + B. The trigger for B 0 s → D − s π + decays preferentially selects candidates with high p T with respect to the trigger for B 0 s → J/ψ φ decays, which results in higher efficiency for the second BDT requirement for the B 0 s → D − s π + final state. The B + c and B 0 s candidates are required to be produced in the pseudorapidity range 2 < η(B) < 5.
The invariant mass distributions for the B + c → B 0 s π + candidates are shown in Fig. 3  fit. The combinatorial background is primarily due to signal B 0 s decays combined with a random pion from the primary vertex, and is modeled with an exponential function. Backgrounds due to B + c → B * s π + and B + c → B 0 s ρ + decays, where the photon or neutral pion are not reconstructed, are simulated, and their shapes are modeled with Gaussian distributions, with parameters fixed in the fit, and yields allowed to vary. Statistical signal significances of 7.7 σ for B + c → B 0 s (→ D − s π + )π + and 6.1 σ for B + c → B 0 s (→ J/ψ φ)π + decays are obtained from the likelihood ratio of fits with and without the probability density function for the signal shape, −2ln(L B /L S+B ), with 64 ± 10 and 35 ± 8 signal decays, respectively.
In Fig. 3a, the structure around 6225 MeV/c 2 is consistent with originating from B + c → B * s π + decays. However, this contribution is not significant. To obtain the value for the B + c → B 0 s π + branching fraction, multiplied by the ratio of B + c and B 0 s production rates, the relative detection efficiency of B 0 s decays compared to B + c → B 0 s π + decays is determined from simulation. Requiring the bachelor pion to be inside the LHCb acceptance reduces the B + c → B 0 s π + yield by about 19 % with respect to the B 0 s yield. The most significant reduction in the number of selected B + c candidates comes from suppressing B 0 s combinations with a random pion from the primary interaction, by means of the second BDT selection. The total relative detection efficiency of B + c → B 0 s π + decays with respect to B 0 s decays is estimated to be 15.2 % for the B 0 s → J/ψ φ decay and 33.9 % for the B 0 s → D − s π + final state. This difference in B + c selection efficiencies is a consequence of the difference in B 0 s trigger and selection requirements. The sources of systematic uncertainty for the efficiency-corrected ratio of B + c and B 0 s yields are listed in Table 1. The uncertainty on the B 0 s yield in the D − s π + analysis is determined by varying the parameters that describe the tails of the signal mass distribution, and by reducing the exponent of the combinatorial background by a factor two. The Table 1: Contributions of the various sources of (relative) systematic uncertainty on the efficiencycorrected ratio of event yields. The total systematic uncertainty is the quadratic sum of the individual contributions. The number of B + c → B 0 s (→ D − s π + )π + candidates is large enough that the peak position and width are freely varied in the fit, and hence the corresponding uncertainty is contained in the statistical uncertainty of the signal yield. uncertainty on the B 0 s → J/ψ φ yield is obtained by comparing the fitted yield in simulated pseudo-experiments to the yield that was used as input to those experiments.
The uncertainty on the B + c yield is quantified by varying the peak position and width in the fit to B + c → B 0 s (→ J/ψ φ)π + candidates. The signal model is validated using simulated pseudo-experiments in the J/ψ φ analysis, whereas the tail parameters are varied by ±10 % in the D − s π + analysis. In addition, the combinatorial background shape is changed to a straight line, and the difference in the signal yield is taken as the associated systematic uncertainty. The effect of partially reconstructed B + c → B 0 s ρ + decays is estimated by excluding candidates with mass less than 6150 MeV/c 2 from the fit. The significance of the B + c → B 0 s π + signal is reduced to 7.5 σ for B + c → B 0 s (→ D − s π + )π + and 5.5 σ for B + c → B 0 s (→ J/ψ φ)π + when the systematic uncertainties on the fit to the B + c mass distribution are taken into account.
The relative detection efficiency of B + c and B 0 s events is determined from simulated events. The correspondence between data and simulation is quantified by varying the criterion on the BDT value, and by comparing the observed B 0 s yield to the expected yield based on the change in efficiency as determined from simulation. The largest contribution is due to the 10 % uncertainty on the B + c lifetime [32], which was recently improved by the CDF collaboration [34]. The change in selection efficiency when varying the B + c lifetime by ±10 % is assigned as systematic uncertainty. A longer (shorter) B + c lifetime corresponds to a larger (smaller) efficiency and therefore a smaller (larger) ratio. As a cross-check, the effect of the choice of different sets of BDT input variables is investigated and the result is found to be stable.
The contribution from Cabibbo suppressed B + c → B 0 s K + decays, the uncertainty on the efficiency of reconstructing the extra pion, and the uncertainty on the efficiency of the particle identification requirement on the bachelor pion all give small contributions (< 1.0 %) to the total systematic uncertainty, and are not itemized in the summary in Table 1.
The B 0 s and B + c yields are corrected for the relative detection efficiencies, to obtain the efficiency-corrected ratios of B + c → B 0 s π + over B 0 s yields, 2.54 ± 0.40 (stat) +0.23 −0.17 (syst) × 10 −3 and (2.20 ± 0.49 (stat) ± 0.23 (syst)) × 10 −3 for the D − s π + and J/ψ φ final states, respectively. The small fraction of B 0 s candidates originating from B + c decays is neglected. The uncertainty due to the uncertainty on the B + c lifetime is correlated between the two measurements, and is accounted for in the combined result of the ratio of production rates multiplied with the branching fraction where the first uncertainty is statistical, the second is systematic and the third is due to the uncertainty on the B + c lifetime. Since σ(B + c )/σ(B 0 s ) may depend on the kinematics of the produced B meson, the data are divided according to center-of-mass energy leading to 1.27 ± 0.42 (stat) ± 0.05 (syst) +0.09 −0.07 (τ B + c ) × 10 −3 and 2.92 ± 0.40 (stat) ± 0.12 (syst) +0. 21 −0.16 (τ B + c ) × 10 −3 for √ s = 7 and 8 TeV pp collisions, respectively. The lower value for the result of the 7 TeV data is attributed to a downward statistical fluctuation of the B + c → B 0 s (→ J/ψ φ)π + yield in the 2011 data set, with a p-value of 1.5 %.
In summary, the first observation of a weak decay of a B meson to another B meson is reported. This measurement will help to better understand flavor tagging and the decay time resolution in time-dependent B 0 s analyses, and in addition will constrain models that predict branching fractions of B + c decays.