First observation of the $B^+ \rightarrow D_s^+ D_s^- K^+$ decay

The $B^+ \rightarrow D_s^+ D_s^- K^+$ decay is observed for the first time using proton-proton collision data collected by the LHCb detector at centre-of-mass energies of $7$, $8$ and $13\, \text{TeV}$, corresponding to an integrated luminosity of $9\,\text{fb}^{-1}$. Its branching fraction relative to that of the $B^{+} \rightarrow D^{+} D^{-} K^{+}$ decay is measured to be $$\frac{B\left(B^{+} \rightarrow D_s^{+} D_s^{-} K^{+}\right)}{B\left(B^{+} \rightarrow D^{+} D^{-} K^{+}\right)}=0.525 \pm 0.033 \pm 0.027 \pm 0.034,$$ where the first uncertainty is statistical, the second systematic, and the third is due to the uncertainties on the branching fractions of the $D_s^{\pm} \rightarrow K^{\mp} K^{\pm} \pi^{\pm}$ and $D^{\pm} \rightarrow K^{\mp} \pi^{\pm} \pi^{\pm}$ decays. This measurement fills an experimental gap in the knowledge of the family of Cabibbo$-$favoured $\bar{b} \rightarrow \bar{c} c \bar{s}$ transitions and opens the path for unique studies of spectroscopy in future.


Introduction
The family of B → D ( * ) (s) D ( * ) (s) K ( * ) decays proceeds at the quark level via the Cabibbofavoured b → ccs transition. 1 Such decays provide an excellent laboratory for investigations of open-and hidden-charm meson spectroscopy [4,5,7,8,[11][12][13], covering both conventional and exotic states. Additionally, measurements of the amplitude structures in the D ( * ) (s) D ( * ) (s) system of these decays can offer important information to the theoretical calculations of the charm-loop contributions to b → sℓ + ℓ − processes that are sensitive to physics beyond the Standard Model [1].
There have already been experimental studies of B → D ( * )D( * ) K ( * ) decays by the ALEPH, BaBar, Belle and LHCb collaborations [2][3][4][5][6][7][8][9][10][11][12][13], not only making observations of some of these decay channels, but also leading to discoveries of new resonances. However, all such measurements to date focus only on B decays with a D ( * )D( * ) pair in the final state. Decays involving the D ( * )+ s D ( * )− s pair, e.g. B + → D + s D − s K + , have never been explored. The D + s D − s system is attractive as it provides a unique insight into the charmonium(-like) spectroscopy. Conventional charmonium mesons with natural spin (J), parity (P ), and charge-parity (C) quantum numbers (J P C = 0 ++ , 1 −− , 2 ++ , · · · ), e.g. ψ(4040) [14], are expected to predominantly decay into the D ( * )D( * ) final state. In contrast, charmoniumlike hadrons with the ccss constituents could have a larger partial decay width to the D + s D − s final state than that to D ( * )D( * ) . A few candidates for ccss states have been observed in the J/ψϕ final state from B + → J/ψϕK + decays [15]. They might also decay into D + s D − s and contribute to the B + → D + s D − s K + decay. This paper presents the first observation of the B + → D + s D − s K + decay. The contributing tree-level Feynman diagrams of this decay are shown in Fig. 1. The branching fraction of the B + → D + s D − s K + decay is measured relative to that of the normalisation channel B + → D + D − K + . The two channels have a similar decay topology, so the corresponding systematic uncertainties on the branching-fraction ratio are expected to largely cancel. The measurement uses the proton-proton (pp) collision data collected by the LHCb experiment in 2011, 2012 and 2015-2018 at centre-of-mass energies of 7, 8 and 13 TeV, respectively, corresponding to a total integrated luminosity of 9 fb −1 . The branching fraction of the B + → D + s D − s K + decay is an essential input to obtain the partial width information of the  near-threshold structure in the D + s D − s system analysed in the accompanying paper [16].

Detector and simulation
The LHCb detector [17, 18] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region [19], a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes [20, 21] placed downstream of the magnet. The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV. The minimum distance of a track to a primary pp collision vertex (PV), the impact parameter, is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors [22]. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [23]. The online event selection is performed by a trigger [24], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. At the hardware trigger stage, events are required to have a muon with high p T or a hadron, photon or electron with high transverse energy in the calorimeters. The hadron can originate from either the studied decay or the rest of the event. The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from any PV. At least one charged particle must have a transverse momentum p T > 1.6 GeV and be inconsistent with originating from a PV. A multivariate algorithm [25,26] is used for the identification of secondary vertices consistent with the decay of a b hadron.
Simulation is required to model the effects of the detector acceptance and the imposed selection requirements for the B + → D + s D − s K + and B + → D + D − K + decays. In the simulation, pp collisions are generated using Pythia [27] with a specific LHCb configuration [28]. Decays of unstable particles are described by EvtGen [29], in which final-state radiation is generated using Photos [30]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [31] as described in Ref. [32]. The underlying pp interaction is reused multiple times, with an independently generated signal decay for each event [33]. The particle-identification (PID) response is not well described in the LHCb simulation, and is corrected to match that in data. The correction is determined from calibration samples using a reweighing approach, the so-called PID transformation [34,35].

Event selection
The B + → D + s D − s K + and B + → D + D − K + decays are reconstructed using the D + s → K + K − π + and D + → K − π + π + decay channels, respectively. The selection starts by choosing well-reconstructed tracks that are inconsistent with originating from any PV. The selected tracks should have PID information consistent with the corresponding final-state kaons and pions. Additionally, the opening angle between any two final-state charged tracks must be larger than 0.5 mrad to reduce potential reuse of track segments. The D + and D + s candidates, obtained by combining K and π candidates, are required to have good-quality vertices and their reconstructed masses must lie within ±25 MeV of the known masses [14], corresponding to about three times the mass resolution. The B + candidates are then formed by combining D + (s) , D − (s) and K + candidates. The B + decay vertex must be well reconstructed and significantly displaced from all PVs. The B + candidate is required to have a flight direction pointing back to the PV where it is produced, referred to hereafter as the associated PV. To reduce the contamination from the non-double-charm (NDC) B + decay candidates that have the same set of the final-state tracks as that of the mesons, requirements on the distances between the B + and D ± (s) vertices are imposed, taking advantage of the nonzero flight distance of the D ± (s) meson. A kinematic fit [36] is employed to improve the B + mass resolution by constraining the D ± (s) mass to its known value [14] and requiring the B + candidate to originate from the associated PV. A second kinematic fit with an additional B + mass constraint is utilised when investigating the distributions of the Dalitz-plot variables in [37][38][39] classifiers are employed to reduce further background from random combinations of tracks, referred to hereafter as combinatorial background. The classifier is trained separately for the B + → D + s D − s K + and B + → D + D − K + channels using the corresponding simulated sample as the signal proxy and candidates in data with a reconstructed B + mass of 5360 − 6000 MeV as the background proxy. The discriminating variables used in the classifier include PID information of the final-state tracks, kinematic properties and the decay topology of the B + and D ± (s) candidates. The selection criterion for the response of each classifier is optimised by maximising the figure of merit, N S / √ N S + N B , separately for each channel. Here, N S (N B ) is the expected signal (background) yield within ±20 MeV of the known B + mass [14], estimated as the product of the signal (background) yield without any BDTG requirement and the signal efficiency (1−background-rejection factor) for a given criterion. The signal and background yields without the BDTG requirements in each channel are evaluated using a simple fit to the reconstructed B + mass distribution with the signal and background probability density functions (PDF) modelled by a Gaussian function and an exponential function, respectively. The signal efficiency and background-rejection factor are directly obtained from the samples used to test the classifiers.
Multiple candidates, found in a few percent of pp collision events, are mostly due to the duplicated use of tracks from the same event. In each event, the candidate having the smallest χ 2 of the kinematic fit without the B + mass constraint is retained.
The reconstructed D + s D − s K + and D + D − K + invariant-mass distributions after all the selection requirements are shown in Fig. 2. The mass windows, (5280 − 80, 5280 + 80) MeV and (5280 − 60, 5280 + 80) MeV, for the signal and normalisation channels, respectively, are chosen to exclude the contributions of partially reconstructed background with a missing photon from D * ± s → D ± s γ or D * ± → D ± γ decay. Here, the tighter lower-side threshold for the B + → D + D − K + channel than that for the B + → D + s D − s K + channel is found necessary to exclude all visible tails of the partially reconstructed background [11,12]. These mass windows are also sufficient to exclude partially reconstructed background with a missing π 0 meson from D * + s → D + s π 0 or D * + → D + π 0 decay, as the reconstructed D + s D − s K + or D + D − K + mass is shifted at least 135 MeV below the B + mass peak [14]. For both the B + → D + s D − s K + signal and B + → D + D − K + normalisation channels, any potential background resulting from the misidentification of a single K or π would originate from Cabbibo-suppressed decays and thus is negligible.

Signal extraction
Unbinned extended maximum-likelihood fits to the mass distributions of the B + candidates are performed to determine the signal yields in the B + → D + s D − s K + and B + → D + D − K + channels. The signal PDF is a sum of two Crystal Ball (CB) functions [40] with a common peak position and opposite-side tails. The tail parameters are fixed to the values obtained from the simulated B + → D + D − K + or B + → D + s D − s K + sample and the fractions of the two CB functions are set to 0.5. All other parameters including the peak position, the widths of the two CB functions are free to vary in the fit to data. The combinatorial background is modelled using an exponential function with its parameter free to vary in the fit.
The fit results are shown in Fig. 2. A signal yield of N sig = 360 ± 22, where the uncertainty is statistical only, is obtained for the B + → D + s D − s K + decay, with a purity in a ±20 MeV window around the peak of approximately 84%. The statistical significance of the B + → D + s D − s K + decay is evaluated under the assumption that the log-likelihood difference between the fit without the signal component and the default fit follows a χ 2 distribution with the number of degrees of freedom being the difference in the number of free parameters of the two fits [41]. Given the log-likelihood difference of 646 with four degrees of freedom, the significance is found to be much larger than 10 standard deviations (σ). Since the systematic uncertainty discussed in Sec. 6 is not expected to greatly reduce the significance, this result constitutes the first observation of the B + → D + s D − s K + decay. The signal yield in the B + → D + D − K + channel is obtained as N norm = 3215 ± 65, where the uncertainty is statistical, and the purity is around 91% using the same mass window definition. The B + → D + D − K + signal yield is larger and the corresponding purity is lower compared with those in Refs. [11,12], reflecting that looser selection criteria are imposed in this paper.
To demonstrate the potential contributions from intermediate resonances in the B + → D + s D − s K + decay, the invariant-mass distributions of the D + s D − s , D − s K + and D + s K + combinations, denoted respectively as m(D + s D − s ), m(D − s K + ) and m(D + s K + ), are shown in Fig. 3. The combinatorial background is subtracted using the sPlot method [42] with the B + candidate mass exploited as the discriminating variable. A clear peaking structure is seen at the D + s D − s mass threshold, corresponding to a charmonium(-like) candidate. An amplitude analysis is performed to study the resonant contributions in the B + → D + s D − s K + decay, presented separately in Ref. [16]. The intermediate resonant contributions to the B + → D + D − K + decay have been studied in the previous LHCb analyses described in Refs. [11,12].
Pseudo-experiments are carried out to check potential biases on the extracted signal yields in the signal and normalisation channels. Numerous samples are randomly generated according to the determined PDFs where the number of signal (background) candidates in each sample is varied according to a Poissonian distribution with the mean set to the determined signal (background) yield, referred to as the default result. The signal yield in each generated sample is determined by a fit to the B + candidate mass distribution and its variation from the default result is quantified as the pull value, (N i − N 0 )/σ i , where N i and σ i are the signal yield and its statistical uncertainty in the i-th sample, respectively, and N 0 is the default signal yield. The pull distribution is then fitted using a Gaussian function. The mean represents the bias on the signal yield relative to the size of its statistical uncertainty. The bias is δ sig = 0.09 ± 0.01 for the signal channel and is corrected for in the branching fraction calculation. Here, the uncertainty is due to limited number of the pseudo-experiments. No obvious bias is found on the signal yield for the normalisation channel.
The measured B + → D + s D − s K + and B + → D + D − K + signal yields contain residual NDC contributions, which need to be subtracted in the branching fraction calculation presented in Sec. 5. The expected NDC yields are estimated using the same method as that in Refs. [11,12], based on the number of candidates in the D ± (s) mass sidebands, which are 30-80 MeV away from the known D ± (s) mass [14]. The fractions of the NDC yields in the total B + signal yields, are found to be f sig NDC = (5.2 ± 2.7)% and f norm NDC = (3.2 ± 0.6)% for the B + → D + s D − s K + and B + → D + D − K + channels, respectively. The uncertainties are due to the limited number of the candidates in the D ± (s) mass sidebands. The larger NDC-background contamination in the B + → D + D − K + channel compared with that in Refs. [11,12] is due to relatively looser selection requirements in this paper.

Branching fraction calculation
The branching fraction of the B + → D + s D − s K + decay is measured relative to that of the B + → D + D − K + decay, where the branching fractions of the decays B + → D + D − K + , D + → K − π + π + and D + s → K − K + π + are taken from Ref. [6,14]. The efficiency-corrected signal yields of the where the index i runs over all the selected B + candidates in the signal or normalisation channel. The weight w sig, i or w norm, i assigned to each candidate is obtained using the sPlot method [42]; summing over these weights effects a statistical subtraction of the combinatorial background contribution. The discriminating variable is the reconstructed D + s D − s K + or D + D − K + mass, as shown in Fig. 2. The efficiencies ϵ sig, i and ϵ norm, i , determined using simulated samples, take into account the effects of geometric acceptance, reconstruction and selection requirements. They are evaluated separately in the two datataking periods, 2011-2012 (Run1) and 2015-2018 (Run2), because of the differences in the collision energy and the detector configuration, and are shown in Fig. 4 as functions of the Dalitz-plot variables of the corresponding decays. The kernel density estimation (KDE) [43] technique is employed to obtain smooth efficiency distributions across the Dalitz plots. Finally, the efficiency-corrected yields are determined to be N corr sig = (9.5 ± 0.6) × 10 5 and N corr norm = (5.33 ± 0.11) × 10 6 , where the uncertainties are statistical only. Before obtaining the branching-fraction ratio R, it is necessary to subtract the NDC background contributions and to correct for the small fit bias on the extracted signal yield in the B + → D + s D − s K + channel. These two effects are accounted for by introducing in Eq. 1 two correction factors: (1 − f sig NDC )/(1 − f norm NDC ) for the NDC background subtraction, and 1 − σ sig · δ sig /N sig for the bias correction, where N sig ± σ sig = 360 ± 22 and δ sig = 0.09. The branching-fraction ratio is determined to be where the uncertainty is due to statistical uncertainties on the efficiency-corrected yields of both the B + → D + s D − s K + and B + → D + D − K + channels.

Systematic uncertainties
The sources of the systematic uncertainty on the branching-fraction ratio are summarised in Table 1. They can be classified into two categories: the effects on the signal yields and the effects on the efficiencies. The total uncertainty is obtained by adding the contributions in quadrature.
The sources affecting the signal yields include the choice of the signal and background PDFs in the B + mass fits, the uncertainties on the NDC background fractions in the signal and normalisation channels, the uncertainty on the bias of the B + → D + s D − s K + signal yield and the multiple-candidate removal strategy. To evaluate the systematic uncertainty related to the choice of the signal PDF, an alternative model with a Hypatia function [44] is employed in the B + mass fits to the signal and normalisation channels. The effect due to the choice of the background PDF is evaluated in a similar way, where a second-order polynomial function is used as an alternative. The NDC background fractions in the signal and normalisation channels have large uncertainties, which are due to the limited size of the data samples in the D ± and D ± s mass sidebands. These uncertainties are propagated to the branching-fraction ratio. The uncertainty on the bias of the B + → D + s D − s K + signal yield, resulting from the limited number of the pseudo-experiments that are employed to estimate the bias, is propagated in a similar way. Finally, the effect of the multiplecandidate removal strategy is evaluated by retaining an arbitrarily chosen candidate in an event.
The remaining systematic uncertainty sources in Table 1 influence the efficiencies in the signal and normalisation channels. The hardware trigger is generally not well modelled in the LHCb simulation but the effect is expected to largely cancel in the branching-fraction ratio thanks to the similarity in the topology between the signal and normalisation channels. The residual effect is evaluated by applying to the simulated events an event-by-event correction that is determined by calibration samples [45].
The PID response is not well described in the simulation either and thus is corrected in the default result, with the correction determined from calibration samples using the PID transformation approach [34,35]. The performance of the correction is limited by the size of the calibration samples, the choice of the values of hyperparameters in the transformation approach and the choice of the correction approach itself. The corresponding effects are investigated by extracting the efficiency maps from simulated samples with the correction determined by using alternative calibration samples bootstrapped from the original ones, varying the values of the hyper parameters of the PID transformation approach, and choosing the PID resampling approach [34,35]. The resulting changes on the branchingfraction ratio are taken as the systematic uncertainties and are added in quadrature to obtain the uncertainty due to the imperfect PID correction.
The potentially imperfect descriptions of the discriminating variables in the simulation can cause bias on the efficiencies of the BDTG classifiers. The effect is evaluated by varying the BDTG selection for the two channels and the resulting change of the branching-fraction ratio is taken as the systematic uncertainty.
The uncertainties on the tracking efficiencies for the same species of final-state particles in the signal and normalisation channels are expected to be the same and fully correlated, so their effects on the branching-fraction ratio largely cancel. The residual effect is from different final-state particles in the two channels, i.e. D + s → K − K + π + and D + → K − π + π + . The systematic uncertainty on the branching-fraction ratio is evaluated as 1.0% [46,47].
In the simulation, a truth-matching algorithm is used to identify signal decays. The efficiency of this algorithm is not 100%, leading to an underestimation of the efficiencies shown in Figure 4. However, the truth-matching efficiencies are expected to be very similar between the signal and normalisation channels, and thus the associated systematic uncertainties largely cancel. To evaluate the potential residual effect, the truth-matching efficiencies in the two channels are estimated using B + mass fits to the simulated samples with and without truth matching. The efficiencies are introduced as correction factors to the branching-fraction ratio, and the resulting variation is taken as the systematic uncertainty.
The kernel width in the KDE method is chosen to be 2 GeV 2 in the default result. To evaluate the systematic effect due to this choice, two alternative values, 1 GeV 2 and 3 GeV 2 , are taken in the efficiency parameterisations. The maximum change of the branching-fraction ratio is taken as the systematic uncertainty.
The last systematic uncertainty source affecting the efficiencies is the limited size of simulated samples. The uncertainty is propagated to the branching-fraction ratio using the bootstrap method [48]. Bootstrapped samples are produced by randomly selecting from the simulated candidates with replacement. Each such sample has the same size as the original simulated sample. These samples are employed to determine efficiencies and finally to obtain a set of branching-fraction ratios, whose standard deviation is taken as the systematic uncertainty.

Summary
In conclusion, the B + → D + s D − s K + decay is observed for the first time using the protonproton collision data collected by the LHCb detector at centre-of-mass energies of 7, 8 and 13 TeV, corresponding to an integrated luminosity of 9 fb −1 . The significance of the signal observed is larger than 10 σ. The branching fraction of this decay is measured relative to that of the B + → D + D − K + normalisation channel as In each of these results, the first and second uncertainties are statistical and systematic, respectively, and the third is from the uncertainties on the known branching fractions of the D + → K − π + π + [?], D + s → K  [14] Particle Data Group, P. A. Zyla et al., Review of particle physics, Prog. Theor. Exp.