Evidence for $B^+ \rightarrow h_c K^+$ and observation of $\eta_c(2S) \to p \bar{p} \pi^+ \pi^-$

A search for the decays $B^+ \rightarrow h_c K^+$ and $B^0 \rightarrow h_c K_S^0$ is performed. Evidence for the decay $B^+ \rightarrow h_c K^+$ is found; its significance is $4.8\sigma$. No evidence is found for $B^0 \rightarrow h_c K_S^0$. The branching fraction for $B^+ \rightarrow h_c K^+$ is measured to be $(3.7^{+1.0}_{-0.9}{}^{+0.8}_{-0.8}) \times 10^{-5}$; the upper limit for the $B^0 \rightarrow h_c K_S^0$ branching fraction is $1.4 \times 10^{-5}$ at $90\%$ C.L. In addition, a study of the $p \bar{p} \pi^+ \pi^-$ invariant mass distribution in the channel $B^+ \to (p \bar{p} \pi^+ \pi^-) K^+$ results in the first observation of the decay $\eta_c(2S) \to p \bar{p} \pi^+ \pi^-$ with $12.1\sigma$ significance. The analysis is based on the 711 $\mathrm{fb}^{-1}$ data sample collected by the Belle detector at the asymmetric-energy $e^+ e^-$ collider KEKB at the $\Upsilon(4S)$ resonance.

However, the decay B + → h c K + has not been observed experimentally yet.Neither the Belle [4] nor BaBar [5] Collaborations have found a statistically significant signal of B + → h c K + using the decay mode h c → η c γ.The current branching-fraction upper limit of B(B + → h c K + ) < 3.8 × 10 −5 at 90% confidence level (C.L.) [3] was obtained in the h c search by Belle [4].
Also, the LHCb Collaboration searched for the process B + → h c (→ pp)K + [6] and set the upper limit on the branching fraction product B(B + → h c K + ) × B(h c → pp) < 6.4 × 10 −8 (95% C. L.).However, this measurement does not result in a stronger restriction on B(B + → h c K + ), because the decay h c → pp has never been observed and the upper limit on its branching fraction is B(h c → pp) < 1.5 × 10 −4 (90% C. L.) [3].Note that a newer LHCb analysis of the same channel performed in Ref. [7] does not update the upper limit on B(B + → h c K + ) × B(h c → pp).
Several new theoretical predictions of B(B + → h c K + ) were made after the experimental upper limit was set.The branching fraction has been calculated in the QCD factorization approach to be 2.7 × 10 −5 [8].A calculation using perturbative QCD was performed in Ref. [9]; the result is B(B + → h c K + ) = 3.6 × 10 −5 .Another calculation performed in Ref. [10] results in B(B + → h c K + ) [B(B 0 → h c K 0 )] in the interval from 3.1 × 10 −5 to 5.7 × 10 −5 [from 2.9 × 10 −5 to 5.3 × 10 −5 ], depending on the assumed value of the c quark mass.All the results mentioned above are close to each other, and the theoretical values of B(B + → h c K + ) are slightly below the current experimental upper limit.This motivates an updated study of the decays B + → h c K + , which may be able to find them.
Here we present such an updated search for the decays B + → h c K + , and also include a search for the decays B 0 → h c K 0 S .The analysis is performed using the 711 fb −1 data sample collected by the Belle detector at the asymmetric-energy e + e − collider KEKB [11].The data sample was collected at the Υ(4S) resonance and contains 772 × 10 6 B B pairs.The integrated luminos-ity is 2.8 times greater than the luminosity used in the previous analysis [4].For further improvement of the sensitivity, the new analysis uses ten η c decay channels to reconstruct the decay h c → η c γ; only two channels were used in the old one.The new h c decay channel h c → ppπ + π − observed recently by the BESIII Collaboration [12] is also used for its reconstruction; in addition, we study the decays of other charmonium states to ppπ + π − .The discrimination of the signal and background events is improved by performing a multivariate analysis.

II. THE BELLE DETECTOR
The Belle detector is a large-solid-angle magnetic spectrometer that consists of a silicon vertex detector (SVD), a 50-layer central drift chamber (CDC), an array of aerogel threshold Cherenkov counters (ACC), a barrellike arrangement of time-of-flight scintillation counters (TOF), and an electromagnetic calorimeter (ECL) comprised of CsI(Tl) crystals located inside a superconducting solenoid coil that provides a 1.5 T magnetic field.An iron flux-return located outside of the coil is instrumented to detect K 0 L mesons and to identify muons.The detector is described in detail elsewhere [13].Two inner detector configurations were used.A 2.0 cm radius beampipe and a 3-layer silicon vertex detector were used for the first sample of 140 fb −1 , while a 1.5 cm radius beampipe, a 4layer silicon detector and a small-cell inner drift chamber were used to record the remaining data [14].
We use a geant-based Monte Carlo (MC) simulation [15] to model the response of the detector, identify potential backgrounds and determine the acceptance.The MC simulation includes run-dependent detector performance variations and background conditions.Signal MC events are generated with EvtGen [16] in proportion to the relative luminosities of the different running periods.

III. EVENT SELECTION
We select events of the type B + → h c K + and B 0 → h c K 0 S .Inclusion of charge-conjugate modes is implied hereinafter.The reconstruction is performed with a conversion from Belle to Belle II data format [17].
All tracks are required to originate from the interaction point region: we require dr < 0.2 cm and |dz| < 2 cm, where dr and dz are the cylindrical coordinates of the point of the closest approach of the track to the beam axis (the z axis of the laboratory reference frame coincides with the positron-beam axis).
Charged π, K mesons and protons are identified using likelihood ratios R h1/h2 = L h1 /(L h1 +L h2 ), where h 1 and h 2 are the particle-identification hypotheses (π, K, or p) and L hi are their corresponding likelihoods.The likelihoods are calculated from the combined time-of-flight information from the TOF, the number of photoelectrons from the ACC and dE/dx measurements in the CDC.We require R K/π > 0.6 for K candidates, R π/K > 0.6 for π candidates, and R p/π > 0.6, R p/K > 0.6 for p candidates.The identification efficiency of the above requirements varies in the ranges (94 -99)%, (84 -93)%, and (90 -98)% for π, K, and p, respectively, depending on the h c or η c decay channel.The misidentification probability for the background particles that are not π, K, and p, varies in the ranges (25 -49)%, (4.9 -11.3)%, and (0.5 -1.9)%, respectively.Without the electron background, which is rejected as described below, the π fake rate drops to (20 -35)%.Electron candidates are identified as CDC charged tracks that are matched to electromagnetic showers in the ECL.The track and ECL cluster matching quality, the ratio of the electromagnetic shower energy to the track momentum, the transverse shape of the shower, the ACC light yield, and the track dE/dx ionization are used in our electron identification criteria.A similar likelihood ratio is constructed: R e = L e /(L e + L h ), where L e and L h are the likelihoods for electrons and charged hadrons (π, K and p), respectively [18].An electron veto (R e < 0.9) is imposed on π, K, and p candidates.It is not applied for the K 0 S and Λ daughter tracks, because they have independent selection criteria.For the h c or η c decay channels other than η c → K 0 S K 0 S π 0 and η c → Λ Λ, the electron veto rejects from 3.5 to 15% of the background events, while its signal efficiency is not less than 97.5%.
Photons are identified as ECL electromagnetic showers that have no associated charged tracks detected in the CDC.The shower shape is required to be consistent with that of a photon.
The π 0 candidates are reconstructed via their decay to two photons.The photon energies in the laboratory frame are required to be greater than 30 MeV.The π 0 invariant mass is required to satisfy |M π 0 − m π 0 | < 15 MeV/c 2 .Here and elsewhere, M particle denotes the reconstructed invariant mass of the specified particle and m particle stands for the nominal mass of this particle [3].This requirement corresponds approximately to a 3σ mass window around the nominal mass.The V 0 -particle (K 0 S and Λ) candidates are reconstructed from pairs of oppositely charged tracks that are assumed to be π + π − and pπ − for K 0 S and Λ, respectively.We require corresponding approximately to 5.5σ mass windows in both cases.The V 0 candidates are selected by a neural network using the following input variables: the V 0 candidate momentum, decay angle, flight distance in the xy plane, the angle between the V 0 momentum and the direction from the interaction point to the V 0 vertex, the shortest z distance between the two daughter tracks, their radial impact parameters, and numbers of hits in the SVD and CDC.The separation of the K 0 S and Λ candidates is performed by another neural network.The input variables of this network are the momenta and polar angles of the daughter tracks in the laboratory frame, their likelihood ratios R π/p and the V 0 candidate invariant mass for the Λ hypothesis.
The η c candidates are reconstructed in ten decay channels: pp, ppπ 0 , ppπ + π − , and Λ Λ.The selected η c candidates are required to satisfy |M ηc − m ηc | < 50 MeV/c 2 ; the mass-window width is about 1.6 widths of the η c .
The h c candidates are reconstructed in the h c → η c γ and h c → ppπ + π − decay channels.The invariant mass of the h c candidates is not restricted for the channel η c γ; for the channel ppπ + π − , it is required to be greater than 2.7 GeV/c 2 .The lower mass limit is selected to be very low to study other charmonium states decaying to the same final state.
The B-meson candidates are reconstructed via the decay modes B + → h c K + and B 0 → h c K 0 S .The B candidates are selected by their energy and the beamenergy-constrained mass.The difference of the B-meson and beam energies is defined as ∆E = i E i − E beam , where E i are the energies of the B decay products in the center-of-mass frame and E beam is the beam energy in the same frame.The beam-energy-constrained mass is defined as , where p i are the momenta of the B decay products in the center-of-mass frame.We retain B candidates satisfying the conditions 5.2 < M bc < 5.3 GeV/c 2 and |∆E| < 0.2 GeV.A massconstrained fit is applied to the selected B-meson candidates.
In addition, for the channel h c → η c γ, the h c daughter γ energy is required to be greater than 200 MeV in the B rest frame.This requirement removes the background from low-energy photons, including the peaking backgrounds from B decays to the same final state without the photon.The signal efficiency of this requirement is 100%, because the η c γ invariant mass of all excluded events is smaller than the h c mass.Also, the ratio of the Fox-Wolfram moments [19] F 2 /F 0 is required to be less than 0.3.This requirement reduces the continuum background, rejecting from 18% to 53% of background events, depending on the h c or η c decay channel.Its signal efficiency is from 93.3% to 96.3%.

IV. MULTIVARIATE ANALYSIS AND OPTIMIZATION OF THE SELECTION REQUIREMENTS A. General analysis strategy and data samples
To improve the separation of the signal and background events, we perform a multivariate analysis followed by an optimization of selection requirements.The first stages of the analysis are performed individually for h c → ppπ + π − and each η c decay channel for the h c candidates reconstructed in the η c γ mode (the channels η c → K + K − η and η c → η ′ (→ ηπ + π − )π + π − are optimized separately for η 2γ and η 3π ).They include the determination of two-dimensional (∆E, M bc ) resolution and the distribution of the background in (∆E, M bc ), and the multivariate-analysis stage.The optimization of the selection requirements uses the results of all initial stages as its input.The resolution is used to determine the expected number of the signal events and the distribution of the background in (∆E, M bc ) is used to determine the expected number of the background events in the signal region.The optimization is performed individually for the channel h c → ppπ + π − and globally for all η c decay channels for the channel h c → η c γ.The data selected using the resulting channel-dependent criteria are merged into a single sample for the h c → η c γ channel.The final fit is performed simultaneously to the h c → η c γ and The experimental data are used for determination of the (∆E, M bc ) distribution, selection of the background samples for the neural network, and final fit to the selected events.During the development of the analysis procedure, the h c signal region was excluded to avoid bias of the h c significance.The final fit described in Sec.V was performed on MC pseudoexperiments generated in accordance with the fit result without the h c mixed with the where σ ∆E = 18 MeV and σ M bc = 2.5 MeV/c 2 are the approximate resolutions in ∆E and M bc , respectively, and After completion of the analysis procedure development, this requirement is no longer used.The signal MC is used for the determination of the resolution and the selection of the signal samples for the neural network.The signal MC is generated using the known information about the angular or invariant-mass distributions of the decay products if it is possible; otherwise, uniform distributions are assumed.The angular distribution is known for the channel h c → η c γ.It does not have any free parameters and is proportional to sin 2 θ hc , where θ hc is the h c helicity angle that is defined as the angle between − p B and p ηc , where p B and p ηc are the momenta of the B and η c in the h c rest frame, respectively.In addition, the η c decay resonant structure is taken into account if it is known.The distributions for the channels , and K + K − η 3π are based on the results of a Dalitz plot analysis performed in Ref. [20].The contributions of intermediate φ resonances are taken into account for the channel K + K − K + K − based on the worldaverage branching fractions from Ref. [3].

B. Resolution
The resolution is parameterized by the function (12)  a (y 1 ) + N G1 G (21)  a (x 2 )G (22)  a (y where F CB is an asymmetric Crystal Ball function [21], are asymmetric Gaussian functions, N CB , N G1 and N G2 are normalizations and x i and y i (i = 1, 2, 3) are rotated variables that are given by Here, ((∆E) 0 , (M bc ) 0 ) is the central point and α i is the rotation angle.The central point is the same for all three components.The resolution is determined from a binned maximum likelihood fit to signal MC events.Example resolution fit results (for the channel The (∆E, M bc ) distribution is fitted in order to estimate the expected number of the background events in the signal region.The distribution is fitted to the function where N S is the number of signal events and B is the background density function that is given by where m 0 is the threshold mass, a is a rate parameter and P 3 is a two-dimensional third-order polynomial.The region with ∆E < −0.12 GeV is excluded for the channel Example (∆E, M bc ) fit results (for the channel The red solid line is the fit result, and the blue dotted line is the background.Since there is no significant signal before the optimization of the selection requirements and for the entire ηcγ mass range, the two lines almost coincide.

D. Multivariate analysis
To improve the separation of signal and background events, we perform a multivariate analysis for each individual channel.The algorithm used for the multivariate analysis is the multilayer perceptron (MLP) neural network implemented in the tmva library [22].The following variables are always included in the neural network: the angle between the thrust axes of the B candidate and the remaining particles in the event, the angle between the thrust axes of all tracks and all photons in the event, the ratio of the Fox-Wolfram moments F 2 /F 0 , the B production angle, and the vertex fit quality.For the h c candidates reconstructed in the η c γ channel, the MLP also includes the h c helicity angle, the η c mass, and the number of π 0 candidates that include the h c daughter photon as one of their daughters (separately for two groups of π 0 candidates with the energy of another photon less and greater than 100 MeV).
For the channels η c → K + K 0 S π − , η c → K + K − π 0 , and η c → K 0 S K 0 S π 0 , two invariant masses of the η c daughter particle pairs (both Kπ combinations) are added to the neural network.
The following particle identification variables are included into the neural network if there are corresponding charged particles in the final state: the minimum likelihood ratio R K/π of the h c daughter kaons, the minimum of the two likelihood ratios R p/K , R p/π of the h c daughter protons, and R K/π for the B daughter K + (for the channel B + → h c K + ).Here, the h c daughters may be either direct (from the decay h c → ppπ + π − ) or indirect (the η c daughters for the h c candidates reconstructed in the η c γ mode).
If there is a π 0 or η decaying to γγ in the final state, four additional variables are added: the π 0 (η) mass, the minimal energy of the π 0 (η) daughter photons in the laboratory frame, and the number of π 0 candidates that include a π 0 (η) daughter photon as one of their daughters (for each of the π 0 (η) daughter photons).If there is an η reconstructed in the π + π − π 0 decay mode, then only its mass is added to the MLP.If the η c has a daughter η ′ , then the mass of the η ′ candidate is also included to the neural network.
The signal samples are taken from the signal MC.The background sample is taken from a two-dimensional (∆E, M bc ) sideband.For the channel B + → h c K + , the sideband is defined as The background sample is divided into training and testing samples of equal size.
The channel B 0 → h c K 0 S has a small number of background events.In order to avoid overtraining, the background region for this channel is redefined.It includes all selected events except the central region defined by Eq. (1).In addition, the MLP internal architecture is changed.Instead of the default tmva neural network with two hidden layers, only one hidden layer is used.
The resulting efficiency of the requirement (v > v 0 ) on the MLP output variable v for the training sample is shown in Fig. 3 for the channel B + → h c K + with h c → η c (→ K + K − π 0 )γ.Note that the efficiency is given by where ǫ is the full efficiency, ǫ MLP is the raw MLP output requirement efficiency and ǫ multiple is the best-candidate selection efficiency.Because of the correction by ǫ multiple , the efficiency at the minimal MLP output value v min is not 1 but rather ǫ multiple (v min ).
The best-candidate selection is performed for each of the multivariate-analysis channels separately in the following way.The selected (∆E, M bc ) region is subdivided into three bins in both ∆E and M bc .The selection is performed for each of the bins separately.The candidate with the largest MLP output is selected.One of the bins (−67 < ∆E < 67 MeV, 5.267 < M bc < 5.3 GeV/c 2 ) always contains the entire signal region selected by the optimization procedure as described in Sec.IV E. Thus, the signal region of the final data sample does not contain multiple candidates that originate from the same h c channel.However, multiple candidates from different h c channels are possible.
The best-candidate selection efficiency increases for larger values of the MLP output cutoff value v 0 .For the v 0 values obtained as the result of the optimization of the selection requirements as described in Sec.IV E, the selection procedure removes from 3 to 15% of data events, depending on the multivariate-analysis channel.

E. Optimization of the selection requirements
Optimization of the selection requirements is performed by maximizing the value where i is the channel index, N sig is the expected number of the signal events for the i-th channel, N (i) bg is the expected number of the background events in the signal region, and a = 3 is the target significance.This optimization method is based on Ref. [23].
The signal region is defined as ∆E where M bc are the half-axes of the signal region ellipse.The parameters determined by the optimization are R M bc , and the minimal value of the MLP output (v The expected number of signal events for B + → h c K + is calculated as where N Υ(4S) is the number of Υ(4S) events, B(h c → i) is the branching fraction of the h c to its i-th decay channel, ǫ (i) SR is the reconstruction efficiency for the specific signal region SR, and ǫ ) is the efficiency of the requirement (v > v 0 ) on the MLP output variable v for the signal events.The number of Υ(4S) events is assumed to be equal to the number of B B pairs; the branching fraction B(Υ(4S) → B + B − ) is calculated under the same assumption [3].The signal-region-dependent reconstruction efficiency is calculated as where ǫ R is the reconstruction efficiency, and S i is the signal PDF for i-th h c decay channel (the integral of S i over the signal region is the efficiency of the signal region selection).The unknown branching fraction B(B + → h c K + ) can be set to an arbitrary value because the maximum of F opt does not depend on it.The expected number of signal events for B 0 → h c K 0 S is calculated similarly.
The expected number of background events is calcu-lated as ) is the efficiency of the MLP output requirement for the background events, N hc region is the number of background events in the h c region defined by Eq. ( 2), N full is the full number of the background events, and B i is the background density function defined in Eq. ( 6) for i-th h c decay channel.
The optimization is performed separately for two channel groups.The first group includes the multivariateanalysis channels corresponding to the decay h c → η c γ; the index i runs over all η c decay channels.The second group consists of the single channel h c → ppπ + π − .The separate optimization is required by the difference of further data processing: the data from the first group of channels are combined into a single h c → η c γ data sample, while the h c → ppπ + π − data are fitted with another function, as described below in Sec.V.The optimization results are shown in Table I.
After the optimization, the resulting selection criteria are applied.The selected events for the channel h c → η c γ are merged.The resolution and distribution in (∆E, M bc ) are determined again for the h c → ppπ + π − sample, since the knowledge of the background distribution in (∆E, M bc ) is necessary for the final fit described in Sec.V.The (∆E, M bc ) fit results are shown in Fig. 4.

F. Resolution in M hc
The resolution in M hc is determined from a fit to the combined h c → η c γ or h c → ppπ + π − signal MC samples with η c decaying to the reconstructed channels only.All final selection criteria are applied.The distribution of the difference of the reconstructed and true masses is fitted to a sum of an asymmetric Gaussian and asymmetric double-sided Crystal Ball functions: where ∆M is the difference of the reconstructed and true h c masses, N is the common normalization and f CB is the Crystal Ball fraction.Example resolution fit results (for the channel B + → h c K + with h c → η c γ) are shown in Fig. 5.

V. FIT TO THE DATA A. Default model
For the h c → η c γ final sample, the distribution in the h c mass in the (∆E, M bc ) sideband cannot be used to constrain the background level in the signal region because of the presence of peaking backgrounds, such as  the background from B decays to a similar final state with a π 0 instead of the h c daughter γ.If the second photon from this π 0 has a small energy, then the ∆E and M bc values are close to 0 and the B mass, respectively.Thus, the fit is based on the signal distribution only for the channel h c → η c γ.
For the channel h c → ppπ + π − , there is a signal from B decays to the same final state that do not proceed via any charmonium state, called the noncharmonium signal hereinafter.Because of the possible interference of the charmonium and noncharmonium signals, the distribution of the noncharmonium signal in the ppπ + π − invariant mass needs to be determined by the fit.Thus, both signal and background distributions are included into the fit for the channel h c → ppπ + π − .We perform a simultaneous extended unbinned maximum likelihood fit to the h c → η c γ signal, h c → ppπ + π − background, and h c → ppπ + π − signal distributions.The charmonium states are represented by the Breit-Wigner amplitude: where M R is the invariant mass, m R is the nominal mass, and Γ R is the width of the resonance R. The signal-region density function for the channel h c → η c γ is given by 16) where N hc is the number of signal events, R (ηcγ) hc is the h c mass resolution for the channel η c γ, and P 2 is a second-order polynomial.The background density function B p pπ + π − (M ) for the channel h c → ppπ + π − is a third-order polynomial.The signal density function for the channel h c → ppπ + π − is given by where P 3 is a third-order polynomial representing the noncharmonium signal.The wide states are added coherently to the signal density function, while the states that are narrower than the resolution are added incoherently.The amplitudes are normalized in such a way that all the parameters N R represent the yields of the corresponding states.The signal distribution is fitted to the function where w is the weight of the background events in the signal region that is calculated as the ratio of integrals of the background distribution in (∆E, M bc ) over the signal and background regions.The model described above is the default one; additional models are considered to study systematic uncertainties.In the default model, the masses and widths of all resonances are fixed to their world-average values [3]; all other parameters are free.The best-candidate selection procedure described in Sec.IV D guarantees that there are no multiple candidates in the h c → ppπ + π − signal sample, but multiple candidates in the h c → η c γ signal sample are possible if they originate from different η c decay channels.However, the fraction of the events with multiple candidates is found to be negligibly small.No events with multiple candidates (for the h c masses within the default fitting regions) are observed for both B + → h c K + and The fit results are shown in Fig. 6 for the channel B + → h c K + and in Fig. 7 for the channel B 0 → h c K 0 S .The signal yields and phases are listed in Table II.The statistical significance of the decays B + → h c K + and B 0 → h c K 0 S , as well as the significances of other charmonium states in the channel ppπ + π − are calculated from the difference of (−2 ln L), where L is the maximum likelihood, between the models with and without these states taking the number of degrees of freedom into account.The significance of the decays B + → h c K + and B 0 → h c K 0 S in the default model is found to be 5.0σ and 0.8σ, respectively.The significance of the decays B + → h c K + and B 0 → h c K 0 S with the systematic error taken into account is 4.8σ and 0.7σ, respectively; the procedure of the calculation of the systematic uncertainty is described in Sec.V B. Thus, we find evidence for the decay B + → h c K + , but do not find evidence for B 0 → h c K 0 S .The significances of charmonium states in the channel ppπ + π − (except the h c , which is reconstructed in two decay channels) are shown in Table III.The significance of the η c (2S) in the default model is 12.3σ and 5.9σ for the processes B + → (cc)(→ ppπ + π − )K + and B 0 → (cc)(→ ppπ + π − )K 0 S , respectively.The significance including the systematic error is 12.1σ and 5.8σ, respectively.Consequently, the decay η c (2S) → ppπ + π − is observed for the first time in both B + → (cc)(→ ppπ + π − )K + and B 0 → (cc)(→ ppπ + π − )K 0 S processes.signal in the entire fitting region (bottom left) and in the χcJ region (bottom right).The red solid line is the fit result and the blue dashed line is the background.The maximum of the ppπ + π − signal fit result at the J/ψ peak is more than two times greater than the number of data events in the corresponding bin because the J/ψ peak is narrower than the bin size.Thus, a part of the fit result at the J/ψ peak is not shown.

B. Systematic uncertainty: model dependence
For a systematic-uncertainty study, we consider additional models.They include the models with free masses and widths of the h c and all other charmonium states (with Gaussian constraints in accordance with the errors of their current world average values), with increased order (3) of the background PDF polynomial, different fitting ranges, scaled resolution, and variation of the relative fraction of the channels h c → η c γ and h c → ppπ + π − .For the model with scaled resolution, the resolution func-tion R hc (∆M ) is changed to where S is the resolution scaling parameter.The variation of the relative fraction of the channels h c → η c γ and h c → ppπ + π − is performed by changing the expected yields in these channels by ±1σ, where the error is due to the error of the corresponding branching fractions.The results are listed in Table IV., hc → ppπ + π − signal in the entire fitting region (bottom left) and in the χcJ region (bottom right).The red solid line is the fit result and the blue dashed line is the background.The maximum of the ppπ + π − signal fit result at the J/ψ peak is more than two times greater than the number of data events in the corresponding bin because the J/ψ peak is narrower than the bin size.Thus, a part of the fit result at the J/ψ peak is not shown.

C. Branching fraction
Using the number of reconstructed events, we calculate the branching fractions of the decays B + → h c K + and B 0 → h c K 0 S , as well as the branching fraction products B(B + → (cc)K + ) × B((cc) → ppπ + π − ) and B(B 0 → (cc)K 0 S ) × B((cc) → ppπ + π − ).The decay ψ(2S) → ppπ + π − can proceed via the J/ψ: ψ(2S) → J/ψ(→ pp)π + π − .To remove the events with a J/ψ, the ψ(2S) yield is taken from an alternative fit with an additional J/ψ veto defined as The sources of the systematic uncertainty of the branching fractions include the model dependence (the same set of alternative fit models is used as in Sec.V B), overtraining (the difference between the efficiency in the training and testing samples), the error of the difference of the particle identification requirements efficiency between the data and MC, the difference of the MLP efficiency between the data and MC, tracking efficiency, number of Υ(4S) events, the η c → K + K 0 S π − and Υ(4S) → B + B − or Υ(4S) → B 0 B0 branching fractions.All systematic error sources are listed in Table V for the  The difference of the particle identification requirements efficiency between the data and MC is estimated from several control samples, such as , and τ − → π − π 0 ν τ for π 0 .The resulting overall efficiency ratio depends on the final state and the momenta of the decay products; thus, it is different for all branching fractions.For example, for the h c it is found to be (95.0 ± 3.8)% for the channel B + → h c K + and (93.0 ± 4.5)% for the channel B 0 → h c K 0 S .The error caused by the difference of the MLP efficiency between the data and MC is estimated for the channel h c → η c γ using the decay mode B 0 → η c π − K + .This decay is reconstructed using selection criteria that are as similar as possible to the signal mode B + → h c K + .
The same MLP optimized for B + → h c K + is applied to the control channel.Some MLP input variables used for the signal channel are undefined for B 0 → η c π − K + , for example, the number of π 0 candidates that include the h c daughter γ as one of their daughters.Such variables are held constant.The ratio of the number of signal candidates before and after the application of the MLP selection requirements is used to measure the difference of the MLP efficiency between the data and MC: Only the channel η c → K + K 0 S π − is used before the MLP selection, because only this channel is sufficiently clean for the determination of the number of the signal events without the MLP selection.The ratio r MLP is extracted from a simultaneous fit to the η c mass distribution before and after the application of the MLP selection.The relative difference between the values of r MLP in data and MC is found to be (14.4± 9.0)%.For conservative treatment, the statistical error is added in quadrature to the central value of the difference.The resulting systematic uncertainty caused by the MLP selection efficiency difference in data and MC is 17.0%.
The estimation of the MLP efficiency error for the hadronic channel ppπ + π − is done by performing the fit using the ppπ + π − data without the MLP selection and comparing the resulting branching fraction products B(B + → J/ψK + ) × B(J/ψ → ppπ + π − ) with the results of the default procedure.Their relative difference is found to be 0.2%.
The same estimates of the MLP efficiency uncertainty for the channels h c → η c γ and h c → ppπ + π − are used for both B + → h c K + and B 0 → h c K 0 S .The final value of the MLP efficiency uncertainty is calculated as a weighted average of the errors for the two h c decay channels.The result is slightly different for the channels B + → h c K + and B 0 → h c K 0 S because of the difference in the relative number of the expected h c → η c γ and h c → ppπ + π − events.The MLP efficiency error for the branching fraction products for the channels S is equal to the error for the channel ppπ + π − , since all charmonium states other than the h c are reconstructed in this channel only.
The MLP efficiency uncertainty for the channel h c → η c γ does not include the uncertainty caused by the difference between the data and MC in the distributions of the variables that are not defined for the channel B 0 → η c π − K + .There are four such variables: the η c mass, the h c helicity angle, and two numbers of π 0 candidates that include the h c daughter photon as one of their daughters.The distribution of the h c helicity angle for the signal events is known precisely; thus, there is no additional uncertainty caused by the difference of its distribution in data and MC.The difference of the numbers of π 0 candidates is taken into account by removing these variables from the neural network for the channel B + → h c K + , performing an alternative optimization, and comparing the resulting h c branching fractions in the channel h c → η c γ.The relative difference is found to be 15.6%.The error due to the η c mass distribution uncertainty is estimated by varying the η c mass and width by ±1σ and reweighting the selected MC events in accordance with the relative difference between the modified and default η c mass distributions.The largest resulting efficiency difference is considered as the systematic uncertainty related to the η c mass distribution.This uncertainty is estimated to be 1.3% for both B + → h c K + and B 0 → h c K 0 S channels.The ratio r MLP used for determination of the MLP efficiency uncertainty includes the number of reconstructed events relatively to the number of events in h c → η c (→ K + K 0 S π − )γ.The total expected number of events can be calculated as where ǫ i and B i are the efficiency and branching fraction for i-th channel, respectively.The last term in Eq. ( 21) is proportional to r MLP .Consequently, for the channel h c → η c γ one needs to take into account only the error of B(h c → η c (→ K + K 0 S π − )γ).Errors of the branching fractions of all other channels relatively to h c → η c (→ K + K 0 S π − )γ enter the MLP efficiency error.The final h c branching fraction error is calculated as a weighted average of the errors for the channels h c → η c γ and h c → ppπ + π − .Since branching fraction products are measured for all other charmonium states, they do not have a similar systematic error source.
The resulting branching fractions with both statistical and systematic errors are listed in Table VII.For insignificant decays or decay chains, the confidence intervals are calculated in the frequentist approach [24] using an asymmetric Gaussian as the branching-fraction PDF and the measured central value and errors as its parameters.This is the first measurement of B(B 0 → h c K 0 S ).Also, the branching fraction products B(B + → (cc)K + ) × B((cc) → ppπ + π − ) and B(B 0 → (cc)K 0 S ) × B((cc) → ppπ + π − ) are measured directly in B decays for the first time.The current world-average values of the same branching fractions [3] are also presented in Table VII for comparison if they are known.The values of the branching fraction products are calculated by multiplying the individual branching fractions listed in Ref. [3]

FIG. 1 .
FIG.1.Projections of the resolution fit results onto ∆E and M bc for the channel B + → hcK + with hc → ηc(→ K + K − π 0 )γ.The red solid line is the fit result, the green dashed line is the Crystal Ball component, the blue dotted line is the first Gaussian component, and the brown dashdotted line is the second Gaussian component.

FIG. 2 .
FIG.2.Projections of the results of the fit to the (∆E, M bc ) distribution onto ∆E (with M bc > 5.272 GeV/c 2 ) and M bc (with |∆E| < 20 MeV) for the channel B + → hcK + with hc → ηc(→ K + K − π 0 )γ.The red solid line is the fit result, and the blue dotted line is the background.Since there is no significant signal before the optimization of the selection requirements and for the entire ηcγ mass range, the two lines almost coincide.

FIG. 3 .
FIG. 3. Efficiency of the MLP output requirement (v > v0)for the channel B + → hcK + with hc → ηc(→ K + K − π 0 )γ.The red solid line is the signal efficiency and the blue dashed line is the background efficiency.

FIG. 4 .
FIG.4.Projections of the results of the fit to the (∆E, M bc ) distribution onto ∆E (with M bc > 5.272 GeV/c 2 ) and M bc (with |∆E| < 20 MeV) for the channel B + → hcK + with hc → ppπ + π − after the application of the final MLP output selection criterion.The red solid line is the fit result, and the blue dotted line is the background.The region with ∆E < −0.12 GeV is excluded from the fit because of the presence of peaking backgrounds from partially reconstructed B decays with an additional π meson.The cutoff value is marked by a vertical dashed line.

FIG. 5 .
FIG.5.Resolution in M hc for the channel B + → hcK + with hc → ηcγ.The red solid line is the fit result, the green dashed line is the Crystal Ball component, and the blue dotted line is the Gaussian component.

FIG. 6 .
FIG.6.Fit results in the B + → hcK + channel: hc → ηcγ signal (top left), hc → ppπ + π − background (top right), hc → ppπ + π − signal in the entire fitting region (bottom left) and in the χcJ region (bottom right).The red solid line is the fit result and the blue dashed line is the background.The maximum of the ppπ + π − signal fit result at the J/ψ peak is more than two times greater than the number of data events in the corresponding bin because the J/ψ peak is narrower than the bin size.Thus, a part of the fit result at the J/ψ peak is not shown.

FIG. 7 .
FIG.7.Fit results in the B 0 → hcK 0 S channel: hc → ηcγ signal (top left), hc → ppπ + π − background (top right), hc → ppπ + π − signal in the entire fitting region (bottom left) and in the χcJ region (bottom right).The red solid line is the fit result and the blue dashed line is the background.The maximum of the ppπ + π − signal fit result at the J/ψ peak is more than two times greater than the number of data events in the corresponding bin because the J/ψ peak is narrower than the bin size.Thus, a part of the fit result at the J/ψ peak is not shown.
channel B + → (cc)K + (B + → h c K + for the h c and B + → (cc)(→ ppπ + π − )K + for all other charmonium states) and in Table VI for the channelB 0 → (cc)K 0 S (B 0 → h c K 0 S for the h c and B 0 → (cc)(→ ppπ + π − )K 0 Sfor all other charmonium states).The errors of the tracking efficiency and the difference of the efficiency of the particle identification requirements depend on the multivariate-analysis channel in case of the calculation of B(B + → h c K + ) and B(B 0 → h c K 0 S ); the errors related to the MLP efficiency and overtraining are estimated separately for h c → η c γ and h c → ppπ + π − .The values presented in Tables V and VI are weighted averages.

TABLE I .
Results of the optimization of the selection requirements.The signal-region half-axes R

TABLE II .
The resulting signal yields and phases in the default model.The errors are statistical only.Parameter B + → hcK + B 0 → hcK 0

TABLE III .
Significances of the charmonium states decaying to ppπ + π − except the hc in the default model.State B + → (cc)K + B 0 → (cc)K 0