Study of the ${X^\pm(5568)}$ state with semileptonic decays of the ${B_s^0}$ meson

We present a study of the $X^\pm(5568)$ using semileptonic decays of the $B_s^0$ meson using the full Run II integrated luminosity of 10.4 fb$^{-1}$ in proton-antiproton collisions at a center of mass energy of 1.96\,TeV collected with the D0 detector at the Fermilab Tevatron Collider. We report evidence for a narrow structure, $X^\pm(5568)$, in the decay sequence $X^\pm(5568) \to B_s^0 \pi^\pm$ where $B_s^0 \rightarrow \mu^\mp D_s^\pm \, \mathrm{X}$, $D_s^\pm \rightarrow \phi \pi^{\pm}$ which is consistent with the previous measurement by the D0 collaboration in the hadronic decay mode, $X^\pm(5568) \to B^0_s \pi^\pm$ where $B^0_s \to J/\psi\phi$. The mass and width of this state are measured using a combined fit of the hadronic and semileptonic data, yielding $m = 5566.9 ^{+3.2}_{-3.1} \thinspace {\rm (stat)} ^{+0.6}_{-1.2} {\rm \thinspace (syst)}$\,MeV/$c^2$, $\Gamma = 18.6 ^{+7.9}_{-6.1} {\rm \thinspace (stat)} ^{+3.5}_{-3.8} {\rm \thinspace (syst)} $\,MeV/$c^2$ with a significance of 6.7$\,\sigma$.

We present a study of the X ± (5568) using semileptonic decays of the B 0 s meson using the full run II integrated luminosity of 10.4 fb −1 in proton-antiproton collisions at a center of mass energy of 1.96 TeV collected with the D0 detector at the Fermilab Tevatron Collider. We report evidence for a narrow structure, X ± (5568), in the decay sequence X ± (5568) → B 0 s π ± where B 0 s → µ ∓ D ± s X, D ± s → φπ ± which is consistent with the previous measurement by the D0 Collaboration in the hadronic decay mode, X ± (5568) → B 0 s π ± where B 0 s → J/ψφ. The mass and width of this state are measured using a combined fit of the hadronic and semileptonic data, yielding m = 5566.9 +3.2 −3.1 (stat) +0.6 −1.2 (syst) MeV/c 2 , Γ = 18.6 +7.9 −6.1 (stat) +3.5 −3.8 (syst) MeV/c 2 with a significance of 6.7 σ.

I. INTRODUCTION
Since the creation of the quark model [1,2] it was understood that exotic mesons containing more than one quark-antiquark pair are possible. However, for exotic mesons containing only the up, down and strange quarks it has been difficult to make a definitive experimental case for such exotic states, although some persuasive arguments have been made (for recent comprehensive discussions of exotic hadrons containing both light and heavy quarks, see Refs. [3][4][5][6]). Multiquark states that contain heavy quarks can be more recognizable owing to the distinctive decay structure of heavy quark hadrons. The 2003 discovery by the Belle experiment [7] of the X(3872) in the channel B ± → K ± X(→ π + π − J/ψ) was the first candidate exotic meson in which heavy flavor quarks participate. This state was subsequently confirmed in several production and decay modes by AT-LAS [8], BaBar [9], BES III [10], CDF [11], CMS [12], D0 [13] and LHCb [14] Collaborations. Several additional four-quark candidate exotic mesons have since been found, though in many cases not all experiments have been able to confirm their existence.
Four-quark mesons can be generically categorized as either "molecular states" or tetraquark states of a diquark and an anti-diquark. In the example of the X(3872), a molecular state interpretation would be a colorless D 0 (cū) and a colorlessD * 0 (uc) in a loosely bound state. Such a state would be expected to lie close in mass to the D 0D * 0 threshold. The tetraquark mode of a colored diquark (cu) and colored anti-diquark (cū) is more strongly bound by the exchange of gluons and would be expected to have a mass somewhat below the D 0D * 0 threshold. In many cases, interpretations of four-quark mesons as pure molecular or tetraquark states are difficult and more complex mechanisms may be required [4][5][6]. The firm identification of multiquark mesons and baryons and the study of their properties are of importance for further understanding of nonperturbative QCD.
Recently the D0 Collaboration presented evidence for a new four-quark candidate that decays to B 0 s π ± where B 0 s decays to J/ψφ [15]. This system would be composed of two quarks and two antiquarks of four different flavors: b, s, u, d, with either a molecular constitution as a loosely bound B 0 d and K ± system or a tightly bound tetraquark such as (bd)-(sū), (bu)-(sd), (su)-(bd), or (sd)-(bū) (because the B 0 s meson is fully mixed, the exact quark anti-quark composition cannot be determined). The mass of X ± (5568) is about 200 MeV/c 2 below the B 0 d K ± threshold, thus disfavoring a B 0 d -K ± molecular interpretation. The X ± (5568) was previously reported [15] with a significance of 5.1 σ (including systematic uncertainties and the look-elsewhere effect [16]) in the decay X ± (5568) → B 0 s (J/ψφ)π ± in proton-antiproton collisions at a center of mass energy of 1.96 TeV. The ratio of the number of B 0 s that are from the decay of the X ± (5568) to all B 0 s produced was measured to be [8.6 ± 1.9 (stat) ± 1.4 (syst)] %. In order to reduce the background, a selection was imposed on the angle between the B 0 s and π ± (the "cone cut", ∆R = ∆η 2 + ∆φ 2 < 0.3 [17]). Without the cone cut the significance was found to be 3.9 σ. In addition to increasing the signal-to-background ratio this cone cut limits backgrounds, such as possible excited states of the B c meson, that are not included in the available simulations. Multiple checks were carried out to ensure that the cone cut did not create an anomalous signal [15]. Varying the cone cut from ∆R max = 0.2 to 0.5 gave stable fitted masses and resulted in no unexpected changes in the result. The invariant mass spectra of the B 0 s candidates and charged tracks with kaon or proton mass hypotheses were checked, and no resonant enhancements in these distributions were found. The invariant mass distribution of B 0 d π ± was also examined with no unexpected resonances or reflections found. Subsequent analyses by the LHCb Collaboration [18] and by the CMS Collaboration [19] have not found evidence for the X ± (5568) in proton-proton interactions at √ s = 7 and 8 TeV. The CDF Collaboration has recently reported no evidence for X ± (5568) in proton-antiproton collisions at √ s = 1.96 TeV [20] with different kinematic coverage than that of Ref. [15].
In this article, we present a study of the X ± (5568) in the decay to B 0 s π ± using semileptonic B 0 s decays, using the full run II integrated luminosity of 10.4 fb −1 in proton-antiproton collisions at a center of mass energy of 1.96 TeV collected with the D0 detector at the Fermilab Tevatron Collider. Charge conjugate states are assumed. Here X includes the unseen neutrino and possibly a photon or π 0 from a D * s decay or other hadrons from the B 0 s decay. The decay process is illustrated in Fig. 1. The semileptonic decay channel has a higher branching fraction than the hadronic channel (B 0 s → J/ψφ). However the presence of the unmeasured neutrino in the final state deteriorates the mass resolution of the signal. Still, a good mass resolution for the X ± (5568) can be obtained in the semileptonic channel for events with a large invariant mass of the µ + D − s system, yielding a comparable number of selected B 0 s candidates in the two channels. The backgrounds in the semileptonic channel are independent of, but somewhat larger than, those in the hadronic channel. The character of possible reflections of other resonant structures is quite different in the semileptonic and hadronic channels. Thus observation of the X ± (5568) in the semileptonic decay channel en-ables an independent confirmation of its existence. We report here the results of the search for the X ± (5568) in the semileptonic channel, as well as a combination of the results in the hadronic and semileptonic channels.

II. D0 DETECTOR
The detector components most relevant to this analysis are the central tracking and the muon systems. The D0 detector has a central tracking system consisting of a silicon microstrip tracker (SMT) and a central fiber tracker (CFT), both located within a 2 T superconducting solenoidal magnet [21,22]. The SMT has a design optimized for tracking and vertexing for pseudorapidity of |η| < 3. For charged particles, the resolution on the distance of closest approach as provided by the tracking system is approximately 50 µm for tracks with p T ≈ 1 GeV/c, where p T is the component of the momentum perpendicular to the beam axis. It improves asymptotically to 15 µm for tracks with p T > 10 GeV/c. Preshower detectors and electromagnetic and hadronic calorimeters surround the tracker. A muon system, positioned outside the calorimeter, covering |η| < 2 consists of a layer of tracking detectors and scintillation trigger counters in front of 1.8 T iron toroidal magnets, followed by two similar layers after the toroids [23].

III. EVENT RECONSTRUCTION AND SELECTION
The B 0 s → µ + D − s X selection requirements have been chosen to optimize the mass resolution of the B 0 s π + system and to minimize background from random combinations of tracks from muons and charged hadrons. The selection criteria are based on those used in Ref. [24] with the cut on the B 0 s isolation removed and have been selected by maximizing the significance of the signal.
The data were collected with a suite of single and dimuon triggers (approximately 95% of the sample is recorded using single muon triggers). The selection and reconstruction of µ + D − s decays requires tracks with at least two hits in both the CFT and SMT.
The muon is required to have hits in at least two layers of the muon system, with segments reconstructed both inside and outside the toroid. The muon track segment is required to be matched to a track found in the central tracking system that has transverse momentum 3 < p T < 25 GeV/c.
The D − s → φπ − ; φ → K + K − decay is selected as follows. The two particles from the φ decay are assumed to be kaons and are required to have p T > 1.0 GeV/c, opposite charge and an invariant mass 1.012 < m(K + K − ) < 1.03 GeV/c 2 . The charge of the third particle, assumed to be a pion, has to be opposite to that of the muon. This particle is required to have transverse momentum 0.5 < p T < 25 GeV/c. The mass of the three particles must satisfy 1.91 < m(K + K − π − ) < 2.03 MeV/c 2 . The three tracks are combined to form a common D − s decay vertex using the algorithm described in Ref. [25]. The D − s vertex is required to be displaced from the pp primary interaction vertex (PV) in the transverse plane with a significance of at least three standard deviations. The cosine of the angle between the D − s momentum and the vector from the PV to the D − s decay vertex is required to be greater than 0.9.
The trajectories of the muon and D − s candidate are required to be consistent with originating from a common vertex (assumed to be the B 0 s semileptonic decay vertex). The cosine of the angle between the combined µ + D − s transverse momentum, an approximation of the B 0 s direction, and the direction from the PV to the B 0 s decay vertex has to be greater than 0.95. The B 0 s decay vertex has to be displaced from the PV in the transverse plane with a significance of at least four standard deviations. The transverse momentum of the µ + D − s system is required to satisfy the condition p T > 10 GeV/c to suppress backgrounds. To minimize the effect of the neutrino in the final state the effective mass is limited to 4.5 GeV/c 2 < m(µ + D − s ) < m(B 0 s ). The impact parameters (IP) [26] with respect to the PV of the four tracks from the B 0 s decay are required to satisfy the following criteria: the two-dimensional (2D) IPs of the tracks of the muon and the pion from the D − s decay are required to be at least 50 µm to reject tracks emerging promptly from the PV (this requirement is not applied to the tracks associated with the charged kaons since the mass requirements provide satisfactory background suppression). The three-dimensional (3D) IPs of all four tracks are required to be less than 2 cm to suppress combinations with tracks emerging from different pp vertices reconstructed in the same beam crossing.
The m(K + K − π ± ) distribution of the candidates that pass these cuts [except 1.91 < m(K + K − π − ) < 2.03 MeV/c 2 ] is shown in Fig. 2, where the invariant mass distribution in data is compared to a fit using a function which includes three terms: a second order polynomial used to describe combinatorial background, a Gaussian used to model the D − peak, and a double Gaussian with similar, but different masses and widths used to model the D − s peak.  The K + K − π ± invariant mass distribution for the µ ± φπ ∓ sample (right sign) with the solid curve representing the fit. The lower mass peak is due to the decay D ± → φπ ± and the second peak is due to the D ± s decay. The blue histogram below the data points is the invariant mass distribution for the same-sign sample, µ ± φπ ± .
The selection criteria for the pion in the B 0 s π ± combination have been chosen to match those used in the hadronic analysis. The track representing the pion is required to have transverse momentum 0.5 < p T < 25 GeV/c (the upper limit is applied to reduce background). The pion and the B 0 s candidate are combined to form a vertex that is consistent with the PV. The pion is required to be associated with the PV and have a 2D IP of at most 200 µm and a 3D IP that is less than 0.12 cm. Events with more than 20 B 0 s π ± candidates are rejected. The most likely number of candidates per event is 5.1 and only about 0.1% of the events have more than 20 candidates per event. To improve the resolution of the invariant mass of the B 0 s π ± system we define the invariant mass as m(B 0 [27]. We study the mass distribution in the range 5.506 < m(B 0 s π ± ) < 5.906 GeV/c 2 . When using the hadronic data from Ref. [15] in this paper we use the same mass range as the semileptonic data instead of the slightly shifted mass range used in the original analysis (5.5 < m(B 0 s π ± ) < 5.9 GeV/c 2 ). The semileptonic data are studied with and without a cone cut which is used to suppress background, in which the angle between the µ + D − s system and π ± is required to satisfy ∆R = ∆η 2 + ∆φ 2 < 0.3. The resulting invariant mass distributions for the semileptonic channel are shown in Fig. 3.
The selection cuts and resulting kinematics for the hadronic and semileptonic channels are quite similar. The requirement that muons be seen outside the toroids means that the minimum p T for the J/ψ in the hadronic channel is about 4 GeV and about 3 GeV for the single muon in the semileptonic channel. The minimum p T for the additional pion is 0.5 GeV for both the hadronic and semileptonic channels. For both channels, we require the minimum p T (B 0 s π) to be greater than 10 GeV and the average p T (B 0 s π) for events with m(B s π) ≈ 5.5 GeV is ≈ 17 GeV. For both channels the B 0 s π candidates are in the range of −2 < η < 2 and more than half of the events have a muon with |η| > 1.

IV. MONTE CARLO SIMULATION, BACKGROUND MODELING AND PARAMETERIZATION
Monte Carlo (MC) samples are generated using the pythia [28] event generator, modified to use evtgen [29] for the decay of hadrons containing b or c quarks. The generated events are processed by the full detector simulation chain. Data events recorded in random beam crossings are overlaid on the MC events to simulate the effect of additional collisions in the same or nearby bunch crossings. The resulting events are then processed with the same reconstruction and selection algorithms as used for data events.
The MC sample for X ± (5568) signal is generated by modifying the mass of the B ± meson and forcing it to decay to B 0 s π ± using an isotropic S-wave decay model. The X ± (5568) is simulated with zero width and zero lifetime. The resulting K + K − π − and B 0 s π ± invariant mass distributions are shown in Fig. 4 with all selection requirements.
The signal component of the K + K − π ± invariant mass distribution (Fig. 4 a) is modeled by two Gaussian functions and the background by a second-order polynomial. The signal of the m(B 0 s π ± ) distribution ( Fig. 4 b) is well modeled with a single Gaussian and the background with a third-order polynomial times an exponential. Using the results of these fits the reconstruction efficiency of the charged pion in the decay GeV/c where the systematic uncertainty represents the expected differences between the reconstruction efficiencies for low-momentum tracks in the MC simulation and data.
It is not possible to create a model of the background that is based only on data. Since the X ± (5568) decays to B 0 s mesons, any data sample that includes B 0 s decays will also include the signal and is unsuitable for modeling the background. Hence, we use MC-generated B 0 s events that result from known particles that have decays that include a B 0 s in the decay chain, combined with data events where the muon has the same sign as the D − s candidate (SS events). MC event generators do not include all possible states as in many cases they have not been experimentally observed. For example, bc resonances decaying to B 0 s mesons could contribute to our sample. There are two distinct sources of background in this analysis. The first occurs when an X ± (5568) candidate is reconstructed from a real µ + and D − s together with a random charged track. This background is modeled using MC samples.
The background MC sample is generated using the pythia inclusive heavy flavor production model and events are selected that contain at least one muon and a D ∓ s → φπ ∓ decay where φ → K + K − . To correct for the difference in lifetimes in the MC simulation and data, a weighting is applied to all nonprompt events in the simulation, based on the generated lifetime of the B candidate, to give the world-average B hadron lifetimes [27]. To correct for the effects of the trigger selection and the reconstruction in data, we also weight each MC event so that the transverse momenta of the reconstructed muon and the µ + D − s system agree with those in the data. The p T distribution of the B 0 s π ± system is altered significantly by the weighting as shown in Fig. 5(a). However, the effect is relatively small for the B 0 s π ± mass distribution as seen in Fig. 5 The second source of background is the combinatorial background that occurs when a X ± (5568) candidate is reconstructed from a spurious D − s candidate formed from three random charged tracks that form a vertex. This background is modeled using data events where the muon has the same sign as the D − s candidate (SS events). In Fig. 6(b) we compare the reweighted MC background simulation, smoothed using one iteration of the 353QH algorithm [30], with the SS data for the no cone cut case. These two backgrounds are in good agreement since the χ 2 between them is 50 for 50 bins. We therefore choose to use the MC background shape only, for the data without the cone cut. In Fig. 6(a) we make the same comparison for the data with the cone cut. In this case, χ 2 = 77 for the 50 bins, and we therefore need to  To construct the background sample for the data with the cone cut the fraction of MC and SS backgrounds need to be determined. This is found by fitting the data with a combination of the MC and SS with the fraction of MC events as a free parameter in the sideband mass range 5.506 < m(B 0 s π ± ) < 5.55 and 5.650 < m(B 0 s π ± ) < 5.906 GeV/c 2 . The best agreement is found when the MC fraction is (62 ± 2) %.
We choose the background parametrization for the in-variant mass distribution, both with and without the cone cut, to be where m = m(B 0 s π ± ), m 0 = m − m th and m th = 5.5063 GeV/c 2 is the mass threshold. Our baseline choice of Eq. (1) gives an equivalently good description of the background as that used in Ref. [15] [Eq. (2)]. It has the advantages of having one fewer parameter and being zero at the mass threshold. Three alternative parametrizations are used to model the background. The first is that used in Ref. [15], where m ∆ = m − ∆ and ∆ = 5.500 GeV/c 2 . The second is the ARGUS function [31] which is specifically constructed to describe background near a threshold The third alternative model used to fit the background is the MC histogram (or combined MC and SS data) smoothed using one iteration of the 353QH algorithm [30]. The ARGUS function is not used as an alternate parametrization in the semileptonic data with the cone cut, because the fit to background is strongly disfavored (the χ 2 of the fit to the MC background is 145 compared with approximately 50 for the alternate functions). The χ 2 per number of degrees of freedom (ndf) for the four representations of the background are shown in Table I.
We choose the background description of Eq. (1) as the baseline. The alternative functions and the smoothed MC are used to estimate the systematic uncertainty on the background shape. The m(B 0 s π ± ) background model distribution along with the fit using Eq. (1) is presented in Fig. 7.

V. SIGNAL MASS RESOLUTION
We calculate the mass of the B 0 s π ± system using the quantity Before carrying out the search for the X ± (5568) in the semileptonic channel we ensure that it is an unbiased and precise estimator of the mass of the B 0 s π ± system. This is studied by simulating the two body decay X(5568 is calculated and compared to the input mass m(B 0 s π ± ). To evaluate how well the mass approximation works to compensate for the missing neutrino, we model the decay with a toy MC that simulates the virtual W in B 0 s → D ∓ s + W * with an isotropic distribution of µ and ν in the W boson rest frame. The resulting resolution of a zero width resonance due to the presence of the neutrino is modeled by a Gaussian. The width varies according to m(B 0 s π ± ) as illustrated by the solid line in Fig. 8. The mass resolution for the D0 detector of a state decaying into five reconstructed charged particles with a similar kinematic range as in this study is measured using the MC simulation and is given by a Gaussian function of width 3.85 MeV/c 2 . The m(B 0 s π ± ) resolution function is obtained by convoluting the Gaussian tracking resolution and the smearing resolution resulting from the missing neutrino. The resulting combined resolution, the dashed line in Fig. 8, can be approximated by where m 0 has the same definition as in Eq. (1). These studies show that the difference between m(B 0 s π ± ) and m(B 0 s π ± ) is less than 1 MeV/c 2 in the search region. This is confirmed with the signal MC sample.

VI. SIGNAL FIT FUNCTION
The X ± (5568) resonance is modeled by a relativistic Breit-Wigner function convolved with a Gaussian detector resolution function given in Eq. (5), F sig (m, m X , Γ X ), where m X and Γ X are the mass and the width of the resonance.
The fit function has the form where f sig and f bgr are normalization factors. The shape parameters in the background term F bgr are fixed to the values obtained from fitting the MC background distribution (see Fig. 7). We use the Breit-Wigner parametrization appropriate for an S-wave two-body decay near threshold: The mass-dependent width Γ(m) = Γ X ·(q 1 /q 0 ), where q 1 and q 0 are the magnitudes of momenta of the B 0 s meson in the rest frame of the B 0 s π ± system at the invariant mass equal to m and m X , respectively.

VII. X ± (5568) SEMILEPTONIC FIT RESULTS
In the fit to the semileptonic data with the cone cut shown in Fig. 9(a), the normalization parameters f sig and f bgr and the Breit-Wigner parameters m X and Γ X are allowed to vary. The fit yields the mass and width of m X = 5566.4 +3.4 In the fit to the semileptonic data without the cone cut shown in Fig. 9(b), the mass and width of m X = 5566.7 +3.6 −3.4 MeV/c 2 , Γ X = 6.0 +9.5 −6.0 MeV/c 2 , the number of signal events, N = 139 +51 −63 , and a χ 2 = 30.4 for 46 degrees of freedom. The p-value of the background only fit is 7.7 × 10 −6 and the local statistical significance is 4.5 σ. The fit results, both for the cone cut and no cone cut cases, are given in Table II and for various background  parametrizations in Table III. The X ± (5568) parameters for the cone cut and no cone cut cases are consistent.

A. Systematic uncertainties
Systematic uncertainties (Table IV) are obtained for the measured values of the mass, width and event yield of the X ± (5568) signal. The dominant uncertainty is due to (i) the description of the background shape. We evaluate this systematic uncertainty by using the alternative paramaterizations of the background, Eqs. (2), (3) and the smoothed MC histogram and finding the maximal deviations from the nominal fit.
The effect of (ii) the MC weighting is estimated by creating 1000 background samples where the weights have been randomly varied based on the uncertainties in the weighting procedure.
Other sources of systematic uncertainty are evaluated by (iii) varying the energy scale in the MC sample relative to the data by ±1 MeV/c 2 , (iv) varying the mass resolution of the X ± (5568) either by ±1 MeV/c 2 around the mean value, or by using a constant resolution of 11.1 MeV/c 2 obtained from the MC simulation of the X ± (5568) signal, (v) using a P-wave relativistic Breit-Wigner function, and (vi) estimating the shift of the fitted mass peak due to the missing neutrino.
Systematic uncertainties are summarized in Table IV. The uncertainties are added in quadrature separately for positive and negative values to obtain the total systematic uncertainties for each measured parameter. The results including systematic uncertainties are given in Table II.

B. Significance
Since we are seeking to confirm the result presented in Ref. [15] we do not apply a correction for a look elsewhere effect. The systematic uncertainties are treated as nuisance parameters to construct a prior predictive model [27,32] of our test statistic. When the systematic uncertainties are included, the significance of the observed semileptonic signal with the cone cut is 3.2 σ (pvalue = 1.4 × 10 −3 ). The significance of the semileptonic signal without the cone is 3.4 σ (p-value = 6.4 × 10 −4 ).

C. Closure tests
We have tested the accuracy of the fitting procedure using toy MC event samples constructed with input mass and width of 5568.3 and 21.9 MeV/c 2 respectively, with the number of input signal events varied in steps of 25 between 75 and 350. At each number of input signal events, 10,000 pseudoexperiments were generated. The signals are modeled with a relativistic Breit-Wigner function convolved with a Gaussian function representing the appropriate detector resolution. The background distribution is based on Eq. (1). For each trial the fitting procedure is performed to obtain the mass and width and the number of semileptonic signal events. The results of each set of trials is fitted with a Gaussian to determine the mean and the uncertainty in the number of signal events, the mass and the width (see Table V). The number of fitted signal events vs. the number of injected signal events for the semileptonic samples are plotted in Fig. 10.
For the ensembles with a number of input events similar to that observed in data, there is a slight overestimate of the yield and fitted mass, and the width is underestimated. This width reduction is in agreement with the results of the fits to data (Sec. VII), and indicate that the semileptonic data are not sensitive to the width. These effects are accounted for in the calculation of the significance.

D. Comparison with hadronic channel
The measured values of the mass, width, the number of signal events, and significance of the signal for the semileptonic channel and the hadronic channel [15] are given in Table VI. The mass and width of the X ± (5568) for the semileptonic and hadronic channels are consistent taking into account the uncertainties. The observed yields are consistent with coming from a common particle given the number of B 0 s events in the sample and the B 0 s branching ratios.

E. Cross-checks
As a cross-check the B 0 s π ± mass-bin size is set to 5 MeV/c 2 and to 10 MeV/c 2 instead of 8 MeV/c 2 , and the lower edge of the fitted mass range is shifted by 2, 3, 5, and 7 MeV/c 2 . This leads to maximal variations in the mass of +0.1 −0.6 MeV/c 2 , in the width of +1.7 −0.9 MeV/c 2 and in the number of signal events +0 −9 which are small compared to the statistical and systematic uncertainties.
To test the stability of the results, alternative choices are made regarding the fit parameters. In the first, the background fit parameters are allowed to float. The resulting fit is consistent with the nominal fit and the p-value of the background-only fit is 1.7 × 10 −4 corresponding to a statistical significance of 3.8 σ (Table VII).   Fig. 9).

Cone cut
No cone cut The second cross-check fixes the mass and width of the X ± (5568) to the values found in Ref. [15]. Again, the resulting fit is consistent with the nominal fit with an increase in the number of signal events due to the increased width of the peak. The p-value of the background-only fit is 1.1 × 10 −4 corresponding to a statistical significance of 3.9 σ (Table VII). These cross-checks are also repeated without the cone cut (Table VII).
To calculate the production ratio of the X ± (5568) to B 0 s , the number of the B 0 s -mesons needs to be estimated. The fitting of the K + K − π ∓ mass distribution is described in Sec. IV. The number of D ∓ s mesons extracted from the fit and adjusted for the mass range 1.91 < m(K + K − π ∓ ) < 2.03 MeV/c 2 is N (D ∓ s ) = 6648 ± 127 (see Fig. 2). The number of µ ± D ∓ s events in the signal sample that are the result of a random combination of a promptly produced D ∓ s and a muon in the event is estimated using events where the muon and the D ∓ s -meson have the same sign. The same sign data sample is analyzed using the same model as the opposite sign data with the means and widths of the Gaussians fixed to the values obtained from the opposite sign data. The number of events in the same-sign sample is N (D ± s ) = 426 ± 61. The mass distributions of the K + K − π ∓ for opposite and same-sign data are shown in Fig. 2.
The number of B 0 s -meson decays in the semileptonic  data is estimated by subtracting the contribution of the promptly produced µ ± D ∓ s events from the overall µ ± D ∓ s sample. A study of the MC background simulations shows that the purity of the resulting sample is 99.5 +0.5 −1.0 %. We find 6222 ± 141 B 0 s events. Combining these results and using the efficiency for the charged pion in the X(5568) decay (Sec. IV), we obtain a production ratio for the semileptonic data of for our fiducial selection (which includes the requirements is the number of X ± (5568) decays to B 0 s π ± and N B 0 s (sl) is the inclusive number of B 0 s decays, both for semileptonic decays of the B 0 s . This result is similar to the ratio measured in the hadronic channel [8.6 ± 1.9 (stat) ± 1.4 (syst)] % for p T (J/ψφπ ± ) > 10 GeV/c 2 [15].

IX. COMBINED SIGNAL EXTRACTION
We now proceed to fit the hadronic and semileptonic data sets simultaneously. The hadronic data set is the same as used in Ref. [15] except that the data are fitted in the mass range 5.506 < m(B 0 s π ± ) < 5.906 GeV/c 2 instead of 5.500 < m(B 0 s π ± ) < 5.900 GeV/c 2 . The data selection and background modeling for the hadronic data set are described in detail in Ref. [15].
The fit function has the form F sl =f sl,sig F sl,sig (m, m X , Γ X ) + f sl,bgr F sl,bgr (m) (10) where f h(sl),sig and f h(sl),bgr are normalization factors. The shape parameters in the background terms F h(sl),bgr are fixed to the values obtained from fitting the respective background models for the hadronic (h) and semileptonic (sl) samples to Eq. (1). The signal shape F h(sl),sig is modeled by relativistic Breit-Wigner function convolved with a Gaussian detector resolution function that depends on the data sample. For the semileptonic sample the detector resolution is given by Eq. (5) and for the hadronic channel it is 3.85 MeV/c 2 . For the data without the cone cut the combined data are fitted in the range 5.506 < m(B 0 s π ± ) < 5.706 GeV/c 2 as the hadronic background is not well modeled for m(B 0 s π ± ) > 5.706 GeV/c 2 [15]. The same Breit-Wigner parameters m X and Γ X are used for the hadronic and semileptonic samples. In the fits shown in Fig. 11, the normalization parameters f h(sl),sig and f h(sl),bgr and the Breit-Wigner parameters m X and Γ X are allowed to vary. Since the fraction of B 0 s events produced by the decay of the X ± (5568) should be essen-tially the same in the hadronic and semileptonic channels the X ± (5568) event yields (N h and N sl ) are constrained using the parameter which is required to be consistent with the B 0 s -meson production rate in the hadronic and semileptonic channels = 0.054 ± 0.020, (12) where N B 0 s (sl) = 6222 ± 144, N B 0 s (h) = 5582 ± 100 are the number of semileptonic and hadronic B 0 s decays in the sample. A likelihood penalty of .020 is the uncertainty. This uncertainty includes the statistical uncertainty in the number of B 0 s events and the uncertainties in the relative reconstruction efficiencies and acceptances between the hadronic and semileptonic data. A ratio has been chosen for the constraint as it is well behaved if either of the event yields (N h and N sl ) approaches zero.
The fit results are summarized in Table VIII and the correlations between the fit parameters are given in Table IX. The correlation of nearly one between N X (sl) and N X (had) is a result of the constraint on the event yields [Eq. (11)]. The local statistical significance of the signal is defined as −2 ln(L 0 /L max ), where L max and L 0 are likelihood values at the best-fit signal yield and the signal yield fixed to zero obtained from a binned maximumlikelihood fit. For the cone cut the p-value of the fit to the data with the cone cut is 2.2 × 10 −14 and the local statistical significance is 7.6 σ. The p-value without the cone cut is 8.2×10 −9 and the local statistical significance is 5.8 σ.

A. Systematic uncertainties
The systematic uncertainties of the combined fit are given in Table X. The uncertainty on (i) the background shape descriptions is evaluated by using the alternative paramaterizations of the background, Eqs. (2), (3) and the smoothed MC histogram independently for the semileptonic and the hadronic channels (16 different fits) and finding the maximal deviations from the nominal fit.
The effect of (ii) the MC weighting for the semileptonic background is estimated by creating 1000 background samples where the weights have been randomly varied based on the uncertainties in the weighting procedure and measuring the standard deviation and bias of the measured values.
The (iii) MC component of the background for the hadronic sample is made up of a mixture of two independent MC samples with different production properties (see Ref. [15]) and the systematic uncertainties due to this are found by varying the composition of this mixture and measuring the standard deviation and bias of the measured values. The (iv) size of the hadronic sidebands is varied using the maximal deviations from the nominal fit to estimate the systematic uncertainty.
The systematic uncertainty due to the (v) fraction of MC and SS data in the semileptonic sample, (vi) the MC and sideband data in the case of the hadronic, is varied independently between the two samples measuring the standard deviation and bias of the measured values. Since the background model for the semileptonic sample without the cone cut only uses the MC background simulation this uncertainty (v) does not apply.
All of the uncertainties due to the modeling of the background are assumed to be independent for the hadronic and semileptonic data samples.
The remaining uncertainties are measured by finding the maximal deviations from the nominal fit for (vii) varying the energy scale in the semileptonic and hadronic MC data samples by ±1 MeV/c 2 in both samples simultaneously; (viii) varying the nominal mass resolution of 3.85 MeV/c 2 for the D0 detector by ±1 MeV/c 2 and +2 MeV/c 2 in both the hadronic and semileptonic data samples simultaneously; (ix) varying the resolution of the X ± (5568) peak in the semileptonic channel either by ±1 MeV/c 2 around the mean value given by Eq. (5) or by using a constant resolution of 11.1 MeV/c 2 for the semileptonic data while the mass resolution in the hadronic channel remains at 3.85 MeV/c 2 ; (x) using a P-wave relativistic Breit-Wigner function for both data sets; (xi) setting the shift of the fitted mass peak in the semileptonic data with respect to the hadronic data due to the missing neutrino to ±1 MeV/c 2 ; and (xii) varying the constraint on the relative number of signal events in hadronic and semileptonic channels (Eq. (11)) between 0.034 and 0.074. The correlation of each of the sources of systematic uncertainty between the hadronic and semileptonic data sets is indicated in Table X. The uncertainties are added in quadrature separately for positive and negative values to obtain the total systematic uncertainties for each measured parameter. The results including systematic uncertainties are given in Table VIII.  Table VIII). The background parametrization function is taken from Eq. (1). The look-elsewhere effect (LEE) is determined using the approach proposed in Ref. [33]. We have generated 250,000 simulated background distributions with no signal, both with and without the cone cut. These distributions are fit using the same procedure as the data. The mass parameter of the relativistic Breit-Wigner is constrained to be between 5506 to 5675 MeV/c 2 (the sum of the mass of the B 0 d and K ± ) with a starting value of m X = 5600 MeV/c 2 . The width of the signal is allowed to vary between 0.1 and 60 MeV/c 2 with a starting value of Γ X = 21 MeV/c 2 . The maximum local statistical significance for each distribution is calculated. The resulting distribution of the local significance is fitted with the function where N trials is the number of generated distributions, P 1 is a free parameter and χ 2 (n) is the χ 2 cumulative distribution function for n degrees of freedom. We have used n = 2 and 3 as we are fitting two spectra simultaneously. The resulting function is integrated above the measured local significance to determine the global significance (Table VIII). The significance, not including the systematic uncertainty, of the observed signal accounting for the LEE and with the cone cut applied is 6.9 σ (p-value = 4.1 × 10 −12 ). The significance of the signal without the cone cut is 5.0 σ (p-value = 4.1 × 10 −7 ). The effect of choosing the function in Eq. (13) is studied by modifying it to f loc = N trials χ 2 (2) + P 1 χ 2 (4) and f loc = N trials χ 2 (2) + P 1 χ 2 (3) + P 2 χ 2 (4) with no significant change to the significance being observed. The look-elsewhere effect on the signal significance is checked with a method described in Ref. [33] that relates the tail probability with the number of "upcrossing" at a small reference level. Five hundred simulated background spectra are generated. Each of these 500 distributions is fitted with the background plus signal function with different initial masses from 5506 to 5675 MeV/c 2 in 5 MeV/c 2 steps along with a background-only fit. The significance Correlations between the parameters of the combined fit to the hadronic and semileptonic data sets (see Fig. 11). The yield in the semileptonic channel is NX (sl), the hadronic channel NX (h), while the fraction of background events is f sl,bgr and f h,bgr respectively. is plotted for each of the mass points and the number of upcrossings (each time the significance crosses a small reference value) is measured. The mean number of upcrossings for a reference level of 0.5 is determined and the global significance is calculated. The resulting significance is consistent with the method described above.
The systematic uncertainties are treated as nuisance parameters to construct a prior predictive model [27,32] of our test statistic. When the systematic uncertainties are included, the significance of the observed signal with the cone cut applied for the combined fit is reduced to 6.7 σ (p-value = 1.5 × 10 −11 ) and the significance of the signal without the cone cut is 4.7 σ (p-value = 2.0×10 −6 ).

C. Closure tests
To test the sensitivity and accuracy of the fitting procedure for the combined signal extraction we repeat the closure tests carried out in Sec. VII C with the following modifications. The size of the associated hadronic signal is set using Eqs. (11) and (12). The appropriate detector resolution is used, Eq. (5) for the semileptonic sample and 3.85 MeV/c 2 for the hadronic sample. For each trial the fitting procedure is performed to obtain the mass and width and the number of semileptonic and hadronic signal events. The results of each set of trials is fitted with a Gaussian to determine the mean and the uncertainty in the number of signal events, the mass and the width (see Table XI). The number of fitted signal events vs. the number of injected signal events for the semileptonic and hadronic samples is plotted in Fig. 12. These results show excellent agreement between the input and fit parameters.

D. Cross-checks
To test the stability of the results, alternative choices are made regarding the fit parameters (see Table XII).
When no constraint is placed on the ratio of the event yields in the hadronic and semileptonic channels, Eq. (11), the results are entirely consistent with the fit with the constraint.
We have also carried out a fit in which two of the systematic effects are treated as nuisance parameters in the fit. We allow a mass shift, ∆m, between the hadronic and semileptonic data with a likelihood penalty of 0.5(∆m/1 MeV/c 2 ) 2 . We also allow the overall resolution of the semileptonic signal to vary by ∆σ SL with a likelihood penalty of 0.5(∆σ SL /1 MeV/c 2 ) 2 . The resultant fit produces a mass, width and event yields that are consistent with the default fit and shifts of ∆m = (0.0 ± 1.4) MeV/c 2 and ∆σ SL = (−0.1 ± 1.4) MeV/c 2 .
The significance of a nonzero width is determined by fitting the data with the width set to zero and comparing it with the fit with no constraint on the width (Table XII). Using the data with the cone cut the p-value of the width being consistent with zero is 5.4 × 10 −6 , and the statistical significance is 4.5 σ. The significance without the cone cut is 3.3 σ (p-value = 1.1 × 10 −3 ).     We have presented the results of a search for the X ± (5568) → B 0 s π ± with semileptonic decays of the B 0 s meson. The X ± (5568) → B 0 s π ± state reported in the case that B 0 s → J/ψφ [15] is confirmed for the case that B 0 s → µ ∓ D ± s X, D ± s → φπ ± . The analyses of the hadronic and semileptonic data give similar measurements of the mass, width and production ratio of X ± (5568) to a B 0 s meson. The mass and width of this state are measured using a combined fit of both data sets with the cone cut, yielding m = 5566.9 +3.2