Observation of $J/\psi p$ resonances consistent with pentaquark states in ${\Lambda_b^0\to J/\psi K^-p}$ decays

Observations of exotic structures in the $J/\psi p$ channel, that we refer to as pentaquark-charmonium states, in $\Lambda_b^0\to J/\psi K^- p$ decays are presented. The data sample corresponds to an integrated luminosity of 3/fb acquired with the LHCb detector from 7 and 8 TeV pp collisions. An amplitude analysis is performed on the three-body final-state that reproduces the two-body mass and angular distributions. To obtain a satisfactory fit of the structures seen in the $J/\psi p$ mass spectrum, it is necessary to include two Breit-Wigner amplitudes that each describe a resonant state. The significance of each of these resonances is more than 9 standard deviations. One has a mass of $4380\pm 8\pm 29$ MeV and a width of $205\pm 18\pm 86$ MeV, while the second is narrower, with a mass of $4449.8\pm 1.7\pm 2.5$ MeV and a width of $39\pm 5\pm 19$ MeV. The preferred $J^P$ assignments are of opposite parity, with one state having spin 3/2 and the other 5/2.


Introduction and summary
The prospect of hadrons with more than the minimal quark content (qq or qqq) was proposed by Gell-Mann in 1964 [1] and Zweig [2], followed by a quantitative model for two quarks plus two antiquarks developed by Jaffe in 1976 [3]. The idea was expanded upon [4] to include baryons composed of four quarks plus one antiquark; the name pentaquark was coined by Lipkin [5]. Past claimed observations of pentaquark states have been shown to be spurious [6], although there is at least one viable tetraquark candidate, the Z(4430) + observed in B 0 → ψ K − π + decays [7][8][9], implying that the existence of pentaquark baryon states would not be surprising. States that decay into charmonium may have particularly distinctive signatures [10].
Large yields of Λ 0 b → J/ψ K − p decays are available at LHCb and have been used for the precise measurement of the Λ 0 b lifetime [11]. (In this Letter mention of a particular mode implies use of its charge conjugate as well.) This decay can proceed by the diagram shown in Fig. 1(a), and is expected to be dominated by Λ * → K − p resonances, as are evident in our data shown in Fig. 2(a). It could also have exotic contributions, as indicated by the diagram in Fig. 1(b), that could result in resonant structures in the J/ψ p mass spectrum shown in Fig. 2(b).  In practice resonances decaying strongly into J/ψ p must have a minimal quark content of ccuud, and thus are charmonium-pentaquarks; we label such states P + c , irrespective of the internal binding mechanism. In order to ascertain if the structures seen in Fig. 2(b) are resonant in nature and not due to reflections generated by the Λ * states, it is necessary to perform a full amplitude analysis, allowing for interference effects between both decay sequences.
The fit uses five decay angles and the K − p invariant mass m Kp as independent variables. First we tried to fit the data with an amplitude model that contains 14 Λ * states listed by the Particle Data Group [12]. As this did not give a satisfactory description of the data, we added one P + c state, and when that was not sufficient we included a second state. The two P + c states are found to have masses of 4380 ± 8 ± 29 MeV and 4449.8 ± 1.7 ± 2.5 MeV, with corresponding widths of 205 ± 18 ± 86 MeV and 39 ± 5 ± 19 MeV. (Natural units are used throughout this Letter. Whenever two uncertainties are quoted the first is statistical and the second systematic.) The fractions of the total sample due to the lower mass and higher mass states are (8.4 ± 0.7 ± 4.2)% and (4.1 ± 0.5 ± 1.1)%, respectively. The best fit solution has spin-parity J P values of (3/2 − , 5/2 + ). Acceptable solutions are also found for additional cases with opposite parity, either (3/2 + , 5/2 − ) or (5/2 + , 3/2 − ). The best fit projections are shown in Fig. 3. Both m Kp and the peaking structure in m J/ψ p are reproduced by the fit. The significances of the lower mass and higher mass states are 9 and 12 standard deviations, respectively.   Table 1). The data are shown as solid (black) squares, while the solid (red) points show the results of the fit. The solid (red) histogram shows the background distribution. The (blue) open squares with the shaded histogram represent the P c (4450) + state, and the shaded histogram topped with (purple) filled squares represents the P c (4380) + state. Each Λ * component is also shown. The error bars on the points showing the fit results are due to simulation statistics. Λ 0 b decay phase space. Backgrounds from Ξ b decays cannot contribute significantly to our sample. We choose a relatively tight cut on the BDTG output variable that leaves 26 007±166 signal candidates containing 5.4% background within ±15 MeV (±2 σ) of the J/ψ K − p mass peak, as determined by the unbinned extended likelihood fit shown in Fig. 4. The combinatorial background is modeled with an exponential function and the Λ 0 b signal shape is parameterized by a double-sided Hypatia function [24], where the signal radiative tail parameters are fixed to values obtained from simulation. For subsequent analysis we constrain the J/ψ K − p four-vectors to give the Λ 0 b invariant mass and the Λ 0 b momentum vector to be aligned with the measured direction from the primary to the Λ 0 b vertices [25]. In Fig. 5 we show the "Dalitz" plot [26] using the K − p and J/ψ p invariant massessquared as independent variables. A distinct vertical band is observed in the K − p invariant mass distribution near 2.3 GeV 2 corresponding to the Λ(1520) resonance. There is also a distinct horizontal band near 19.5 GeV 2 . As we see structures in both K − p and J/ψ p mass distributions we perform a full amplitude analysis, using the available angular variables in addition to the mass distributions, in order to determine the resonances present. No structure is seen in the J/ψ K − invariant mass.
We consider the two interfering processes shown in Fig. 1, which produce two distinct decay sequences: Λ 0 b → J/ψ Λ * , Λ * → K − p and Λ 0 b → P + c K − , P + c → J/ψ p, with J/ψ → µ + µ − in both cases. We use the helicity formalism [27] in which each sequential decay A → B C contributes to the amplitude a term (1) where λ is the quantum number related to the projection of the spin of the particle onto its momentum vector (helicity) and H A→B C λ B , λ C are complex helicity-coupling amplitudes describing the decay dynamics. Here θ A and φ B are the polar and azimuthal angles of B in the rest frame of A (θ A is known as the "helicity angle" of A). The three arguments of Wigner's D-matrix are Euler angles describing the rotation of the initial coordinate system with the z-axis along the helicity axis of A to the coordinate system with the z-axis along the helicity axis of B [12]. We choose the convention in which the third Euler angle is zero. In Eq.
is the Wigner small-d matrix. If A has a non-negligible natural width, the invariant mass distribution of the B and C daughters is described by the complex function R A (m BC ) discussed below, otherwise R A (m BC ) = 1.
Using Clebsch-Gordan coefficients, we express the helicity couplings in terms of LS couplings (B L,S ), where L is the orbital angular momentum in the decay, and S is the total spin of A plus B: where the expressions in parentheses are the standard Wigner 3j-symbols. For strong decays, possible L values are constrained by the conservation of parity (P ): Denoting J/ψ as ψ, the matrix element for the where the x-axis, in the coordinates describing the Λ 0 b decay, is chosen to fix φ Λ * = 0. The sum over n is due to many different Λ * n resonances contributing to the amplitude. Since the J/ψ decay is electromagnetic, the values of ∆λ µ ≡ λ µ + − λ µ − are restricted to ±1.
There are 4 (6) independent complex H Λ 0 b →Λ * n ψ λ Λ * , λ ψ couplings to fit for each Λ * n resonance for J Λ * n = 1 2 (> 1 2 ). They can be reduced to only 1 (3) free B L,S couplings to fit if only the lowest (the lowest two) values of L are considered. The mass m Kp , together with all decay angles entering Eq. (3), θ Λ 0 b , θ Λ * , φ K , θ ψ and φ µ (denoted collectively as Ω), constitute the six independent dimensions of the Λ 0 b → J/ψ pK − decay phase space. Similarly, the matrix element for the P + c decay chain is given by where the angles and helicity states carry the superscript or subscript P c to distinguish them from those defined for the Λ * decay chain. The sum over j allows for the possibility of contributions from more than one P + c resonance. There are 2 (3) independent helicity couplings H , and a ratio of the two H Λ 0 b →Pc j K λ Pc , 0 couplings, to determine from the data.
The mass-dependent R Λ * n (m Kp ) and R Pc j (m J/ψ p ) terms are given by Here p is the X = Λ * or P + c momentum in the Λ 0 b rest frame, and q is the momentum of either decay product of X in the X rest frame. The symbols p 0 and q 0 denote values of these quantities at the resonance peak (m = M 0X ). The orbital angular momentum between the decay products of . Similarly, L X is the orbital angular momentum between the decay products of X. The orbital angular momentum barrier factors, p L B L (p, p 0 , d), involve the Blatt-Weisskopf functions [28], and account for the difficulty in creating larger orbital angular momentum L, which depends on the momentum of the decay products p and on the size of the decaying particle, given by the d constant. We set d = 3.0 GeV −1 ∼ 0.6 fm. The relativistic Breit-Wigner amplitude is given by where is the mass dependent width of the resonance. For the Λ(1405) resonance, which peaks below the K − p threshold, we use a two-component Flatté-like parameterization [29] (see the supplementary material). The couplings for the allowed channels, Σπ and Kp, are taken to be equal and to correspond to the nominal value of the width [12]. For all resonances we assume minimal values of L X Λ 0 b and of L X in R X (m). For nonresonant (NR) terms we set BW(m) = 1 and M 0 NR to the midrange mass.
Before the matrix elements for the two decay sequences can be added coherently, the proton and muon helicity states in the Λ * decay chain must be expressed in the basis of helicities in the P + c decay chain, where θ p is the polar angle in the p rest frame between the boost directions from the Λ * and P + c rest frames, and α µ is the azimuthal angle correcting for the difference between the muon helicity states in the two decay chains. Note that m ψp , θ Pc θ Pc ψ , φ Pc µ , θ p and α µ can all be derived from the values of m Kp and Ω, and thus do not constitute independent dimensions in the Λ 0 b decay phase space. (A detailed prescription for calculation of all the angles entering the matrix element is given in the supplementary material.) Strong interactions, which dominate Λ 0 b production at the LHC, conserve parity and cannot produce longitudinal Λ 0 b polarization [30]. Therefore, λ Λ 0 b = +1/2 and −1/2 values are equally likely, which is reflected in Eq. (8). If we allow the Λ 0 b polarization to vary, the data are consistent with a polarization of zero. Interferences between various Λ * n and P + We use two fit algorithms, which were independently coded and which differ in the approach used for background subtraction. In the first approach, which we refer to as cFit, the signal region is defined as ±2 σ around the Λ 0 b mass peak. The total PDF used in the fit to the candidates in the signal region, P(m Kp , Ω| − → ω ), includes a background component with normalization fixed to be 5.4% of the total. The background PDF is found to factorize into five two-dimensional functions of m Kp and of each independent angle, which are estimated using sidebands extending from 5.0 σ to 13.5 σ on both sides of the peak.
In the complementary approach, called sFit, no explicit background parameterization is needed. The PDF consists of only the signal component, with the background subtracted using the sPlot technique [31] applied to the log-likelihood sum. All candidates shown in Fig. 4 are included in the sum with weights, W i , dependent on m J/ψ Kp . The weights are set according to the signal and the background probabilities determined by the fits to the m J/ψ pK distributions, similar to the fit displayed in Fig. 4, but performed in 32 different bins of the two-dimensional plane of cos θ Λ 0 b and cos θ J/ψ to account for correlations with the mass shapes of the signal and background components. This quasi-log-likelihood sum is scaled by a constant factor, s W ≡ i W i / i W i 2 , to account for the effect of the background subtraction on the statistical uncertainty. (More details on the cFit and sFit procedures are given in the supplementary material.) In each approach, we minimize gives the estimated values of the fit parameters, − → ω min , together with their covariance matrix (W i = 1 in cFit). The difference of −2 ln L( − → ω min ) between different amplitude models, ∆(−2 ln L), allows their discrimination. For two models representing separate hypotheses, e.g. when discriminating between different J P values assigned to a P + c state, the assumption of a χ 2 distribution with one degree of freedom for ∆(−2 ln L) under the disfavored J P hypothesis allows the calculation of a lower limit on the significance of its rejection, i.e. the p-value [32]. Therefore, it is convenient to express ∆(−2 ln L) values as n 2 σ , where n σ corresponds to the number of standard deviations in the normal distribution with the same p-value. For nested hypotheses, e.g. when discriminating between models without and with P + c states, n σ overestimates the p-value by a modest amount. Simulations are used to obtain better estimates of the significance of the P + c states. Since the isospin of both the Λ 0 b and the J/ψ particles are zero, we expect that the dominant contributions in the K − p system are Λ * states, which would be produced via a ∆I = 0 process. It is also possible that Σ * resonances contribute, but these would have ∆I = 1. By analogy with kaon decays the ∆I = 0 process should be dominant [33]. The list of Λ * states considered is shown in Table 1.
Our strategy is to first try to fit the data with a model that can describe the mass and angular distributions including only Λ * resonances, allowing all possible known states and decay amplitudes. We call this the "extended" model. It has 146 free parameters from the helicity couplings alone. The masses and widths of the Λ * states are fixed to their PDG values, since allowing them to float prevents the fit from converging. Variations in these parameters are considered in the systematic uncertainties.
The cFit results without any P + c component are shown in Fig. 6. While the m Kp distribution is reasonably well fitted, the peaking structure in m J/ψ p is not reproduced. The same result is found using sFit. The speculative addition of Σ * resonances to the states decaying to K − p does not change this conclusion.
We will demonstrate that introducing two P + c → J/ψ p resonances leads to a satisfactory description of the data. When determining parameters of the P + c states, we use a more restrictive model of the K − p states (hereafter referred to as the "reduced" model) that includes only the Λ * resonances that are well motivated, and has fewer than half the number of free parameters. As the minimal L Λ * Λ 0 b for the spin 9/2 Λ(2350) equals J Λ * −J Λ 0 b −J J/ψ = 3, it is extremely unlikely that this state can be produced so close to the phase space limit.  [12]. We take 5/2 − for the J P of the Λ(2585). The number of LS couplings is also listed for both the "reduced" and "extended" models. To fix overall phase and magnitude conventions, which otherwise are arbitrary, we set B 0, 1 2 = (1, 0) for Λ(1520). A zero entry means the state is excluded from the fit.

State
J  In fact L = 3 is the highest orbital angular momentum observed, with a very small rate, in decays of B mesons [34] with much larger phase space available (Q = 2366 MeV, while here Q = 173 MeV), and without additional suppression from the spin counting factors present in Λ(2350) production (all three J Λ * , J Λ 0 b and J J/ψ vectors have to line up in the same direction to produce the minimal L Λ * Λ 0 b value). Therefore, we eliminate it from the reduced Λ * model. We also eliminate the Λ(2585) state, which peaks beyond the kinematic limit and has unknown spin. The other resonances are kept but high L Λ * Λ 0 b amplitudes are removed; only the lowest values are kept for the high mass resonances, with a smaller reduction for the lighter ones. The number of LS amplitudes used for each resonance is listed in Table 1. With this model we reduce the number of parameters needed to describe the Λ * decays from 146 to 64. For the different combinations of P + c resonances that we try, there are up to 20 additional free parameters. Using the extended model including one resonant P + c improves the fit quality, but it is still unacceptable (see supplementary material). We find acceptable fits with two P + c states. We use the reduced Λ * model for the central values of our results. The differences in fitted quantities with the extended model are included in the systematic uncertainties.
The best fit combination finds two P + c states with J P values of 3/2 − and 5/2 + , for the lower and higher mass states, respectively. The −2 ln L values differ by only 1 unit between the best fit and the parity reversed combination (3/2 + , 5/2 − ). Other combinations are less likely, although the (5/2 + , 3/2 − ) pair changes −2 ln L by only 2.3 2 units and therefore cannot be ruled out. All combinations 1/2 ± through 7/2 ± were tested, and all others are disfavored by changes of more than 5 2 in the −2 ln L values. The cFit results for the (3/2 − , 5/2 + ) fit are shown in Fig. 3. Both distributions of m Kp and m J/ψ p are reproduced. The lower mass 3/2 − state has mass 4380±8 MeV and width 205±18 MeV, while the 5/2 + state has a mass of 4449.8±1.7 MeV and width 39±5 MeV; these errors are statistical only, systematic uncertainties are discussed later. The mass resolution is approximately 2.5 MeV and does not affect the width determinations. The sFit approach gives comparable results. The angular distributions are reasonably well reproduced, as shown in Fig. 7, and the comparison with the data in m Kp intervals is also satisfactory as can be seen in Fig. 8. Interference effects between the two P + c states are particularly evident in Fig.8(d), where there is a large destructive contribution (not explicitly shown in the figure) to the total rate. (A fit fraction comparison between cFit and sFit is given in the supplementary material.) The addition of further P + c states does not significantly improve the fit. Adding a single 5/2 + P + c state to the fit with only Λ * states reduces −2 ln L by 14.7 2 using the extended model and adding a second lower mass 3/2 − P + c state results in a further reduction of 11.6 2 . The combined reduction of −2 ln L by the two states taken together is 18.7 2 . Since taking √ ∆2 ln L overestimates significances, we perform simulations to obtain more accurate evaluations. We generate pseudoexperiments using the null hypotheses having amplitude parameters determined by the fits to the data with no or one P + c state. We fit each pseudoexperiment with the null hypothesis and with P + c states added to the model. The −2 ln L distributions obtained from many pseudoexperiments are consistent with χ 2 distributions with the number of degrees of freedom approximately equal to twice the number of extra parameters in the fit. Comparing these distributions with the ∆2 ln L values from the fits to the data, p-values can be calculated. These studies show reduction of the significances relative to √ ∆2 ln L by about 20%, giving overall significances of 9 σ and 12 σ, for the lower and higher mass P + c states, respectively. The combined significance of two P + c states is 15 σ. Use of the extended model to evaluate the significance includes the effect of systematic uncertainties due to the possible presence of additional Λ * states or higher L amplitudes.
Systematic uncertainties are evaluated for the masses, widths and fit fractions of the P + c states, and for the fit fractions of the two lightest and most significant Λ * states. Additional sources of modeling uncertainty that we have not considered may affect the fit fractions of the heavier Λ * states. The sources of systematic uncertainties are listed in Table 2. They include differences between the results of the extended versus reduced model, varying the Λ * masses and widths, uncertainties in the identification requirements for the proton, and restricting its momentum, inclusion of a nonresonant amplitude in the fit, use of separate higher and lower Λ 0 b mass sidebands, alternate J P fits, varying the Blatt-Weisskopf barrier factor, d, between 1.5 and 4.5 GeV −1 , changing the angular momentum L used in Eq. (5) by one or two units, and accounting for potential mismodeling of the efficiencies. For the Λ(1405) fit fraction we also added an uncertainty for the Flatté couplings, determined by both halving and doubling their ratio, and taking the maximum deviation as the uncertainty.
The stability of the results is cross-checked by comparing the data recorded in 2011/2012, with the LHCb dipole magnet polarity in up/down configurations, produced with low/high values of p T . Extended model fits without including P + c states were tried with the addition of two high mass Λ * resonances of freely varied mass and width, or four nonresonant components up to spin 3/2; these do not explain the data. The fitters were tested on simulated pseudoexperiments and no biases were found. In addition, selection requirements are varied, and the vetoes of B 0 s and B 0 are removed and explicit models of those backgrounds added to the fit; all give consistent results.
Further evidence for the resonant character of the higher mass, narrower, P + c state is obtained by viewing the evolution of the complex amplitude in the Argand diagram [12]. In the amplitude fits discussed above, the P c (4450) + is represented by a Breit-Wigner amplitude, where the magnitude and phase vary with m J/ψ p according to an approximately circular trajectory in the (Re A Pc , Im A Pc ) plane, where A Pc is the m J/ψ p dependent part of the P c (4450) + amplitude. We perform an additional fit to the data using the reduced Λ * model, in which we represent the P c (4450) + amplitude as the combination of independent complex amplitudes at six equidistant points in the range ±Γ 0 = 39 MeV around M 0 = 4449.8 MeV as determined in the default fit. Real and imaginary parts of the amplitude are interpolated in mass between the fitted points. The resulting Argand diagram, shown in Fig. 9(a), is consistent with a rapid counter-clockwise change of the P c (4450) + phase when its magnitude reaches the maximum, a behavior characteristic of a resonance. A similar study for the wider state is shown in Fig. 9(b); although the fit does show a large phase change, the amplitude values are sensitive to the details of the Λ * model and so this latter study is not conclusive. Different binding mechanisms of pentaquark states are possible. Tight-binding was envisioned originally [3,4,35]. A possible explanation is heavy-light diquarks [36]. Examples of other mechanisms include a diquark-diquark-antiquark model [37,38], a diquark-triquark model [39], and a coupled channel model [40]. Weakly bound "molecules" of a baryon plus a meson have been also discussed [41].
Models involving thresholds or "cusps" have been invoked to explain some exotic meson candidates via nonresonant scattering mechanisms [42][43][44]. There are certain obvious difficulties with the use of this approach to explain our results. The closest threshold to the high mass state is at 4457.1±0.3 MeV resulting from a Λ c (2595) + D 0 combination, which is somewhat higher than the peak mass value and would produce a structure with quantum numbers J P = 1/2 + which are disfavored by our data. There is no threshold close to the lower mass state.
In conclusion, we have presented a full amplitude fit to the Λ 0 b → J/ψ K − p decay. We observe significant Λ * production recoiling against the J/ψ with the lowest mass contributions, the Λ(1405) and Λ(1520) states having fit fractions of (15 ± 1 ± 6)% and (19±1±4)%, respectively. The data cannot be satisfactorily described without including two Breit-Wigner shaped resonances in the J/ψ p invariant mass distribution. The significances of the lower mass and higher mass states are 9 and 12 standard deviations, respectively. Table 2: Summary of systematic uncertainties on P + c masses, widths and fit fractions, and Λ * fit fractions. A fit fraction is the ratio of the phase space integrals of the matrix element squared for a single resonance and for the total amplitude. The terms "low" and "high" correspond to the lower and higher mass P + c states. The sFit/cFit difference is listed as a cross-check and not included as an uncertainty.   These structures cannot be accounted for by reflections from J/ψ Λ * resonances or other known sources. Interpreted as resonant states they must have minimal quark content of ccuud, and would therefore be called charmonium-pentaquark states. The lighter state P c (4380) + has a mass of 4380 ± 8 ± 29 MeV and a width of 205 ± 18 ± 86 MeV, while the heavier state P c (4450) + has a mass of 4449.8 ± 1.7 ± 2.5 MeV and a width of 39 ± 5 ± 19 MeV. A model-independent representation of the P c (4450) + contribution in the fit shows a phase change in amplitude consistent with that of a resonance. The parities of the two states are opposite with the preferred spins being 3/2 for one state and 5/2 for the other. The higher mass state has a fit fraction of (4.1 ± 0.5 ± 1.1)%, and the lower mass state of (8.4 ± 0.7 ± 4.2)%, of the total Λ 0 b → J/ψ K − p sample. We express our gratitude to our colleagues in the CERN accelerator departments for the excellent performance of the LHC. We thank the technical and administrative staff at the LHCb institutes. We acknowledge support from CERN and from the national 1 Variables used in the BDTG Muon identification uses information from several parts of the detector, including the RICH detectors, the calorimeters and the muon system. Likelihoods are formed for the muon and pion hypotheses. The difference in the logarithms of the likelihoods, DLL(µ − π), is used to distinguish between the two [16]. The smaller value of the two discriminants DLL(µ + − π + ) and DLL(µ − − π − ) is used as one of the BDTG variables. The next set of variables uses the kaon and proton tracks. The χ 2 IP is defined as the difference in χ 2 of the primary vertex reconstructed with and without the considered track. The smaller χ 2 IP of the K − and p is used in the BDTG. The scalar p T sum of the K − and p is another variable.
The last set of variables uses the Λ 0 b candidate. The cosine of the angle between a vector from the primary vertex to the Λ 0 b vertex and the Λ 0 b momentum vector is one input variable. In addition the χ 2 IP , the flight distance, the p T and the vertex χ 2 of the Λ 0 b candidate are used.  [GeV]  In Fig. 12 we show the result of the reduced model fit to the angular distributions for m(K − p) > 2 GeV. The data is well described by the fits.

Extended model fit with one P + c
In the fits with one P + c amplitude, we test J P values of 1/2 ± , 3/2 ± and 5/2 ± . The mass and width of the putative P + c state are allowed to vary. There are a total of 146 free parameters for the Λ * states to which we add either three complex couplings for 1/2 ± or four for higher spins. The best fit is with a 5/2 + state, which improves −2 ln L by 215. Figure 13 shows the projections for this fit. While the m Kp projection is well described, clear discrepancies in m J/ψ p remain visible.

Results of extended model fit with two P c + states
For completeness we include here the results of the extended model fit with two P c + states using cFit. We find acceptable fits for several combinations. For a lower mass J P = 3/2 − state and a higher mass 5/2 + state, the masses (widths) are 4358.9±6.

Fit fraction comparison between cFit and sFit
The fit fraction for a given resonance is a ratio of the phase space integrals of the matrix element squared calculated for the resonance amplitude taken alone and for the total matrix element summing over all contributions. The fit fractions are listed in Table 3. The P + c states have well determined fit fractions. There is good agreement between cFit and sFit. Note that the results for the Λ(1405) resonance are based on our use of a particular Flatté amplitude model.

Details of the matrix element for the decay amplitude
The matrix element for Λ 0 b → ψK − p, ψ → µ + µ − decays 2 must allow for various conventional Λ * → K − p resonances and exotic pentaquark states P + c → ψp that could interfere with each other.
We use the helicity formalism to write down the matrix element. To make the derivation of the matrix element easier to comprehend we start with a brief outline of this formalism and our notation. Then we discuss the application to the Λ 0 b → Λ * ψ, Λ * → K − p, ψ → µ + µ − decay sequence, called hereafter the Λ * decay chain matrix element. Next we discuss construction of the Λ 0 b → P + c K − , P + c → ψp, ψ → µ + µ − decay sequence, called hereafter the P c decay chain matrix element, which can be coherently added to that for the Λ * decay chain. We also discuss a possible reduction of the number of helicity couplings to be determined from the data using their relationships to the LS couplings.

Helicity formalism and notation
For each two-body decay A → B C, a coordinate system is set up in the rest frame of A, withẑ being 3 the direction of quantization for its spin. We denote this coordinate system as ( , where the superscript "{A}" means "in the rest frame of A", while the subscript "0" means the initial coordinates. For the first particle in the decay chain (Λ 0 b ), the choice of these coordinates is arbitrary. 4 However, once defined, these coordinates must be used consistently between all decay sequences described by the matrix element. For subsequent decays, e.g. B → D E, the choice of these coordinates is already fixed by the transformation from the A to the B rest frames, as discussed below. Helicity is defined as the projection of the spin of the particle onto the direction of its momentum. When the z axis coincides with the particle momentum, we denote its spin projection onto it (i.e. the m z quantum number) as λ. To use the helicity formalism, the initial coordinate system must be rotated to align the z axis with the direction of the momentum of one of the daughter particles, e.g. the B. A generalized rotation operator can be formulated in three-dimensional space, R(α, β, γ), that uses Euler angles. Applying this operator results in a sequence of rotations: first by the angle α about theẑ 0 axis, followed by the angle β about the rotatedŷ 1 axis and then finally by the angle γ about the rotatedẑ 2 axis. We use a subscript denoting the axes, to specify the rotations which have been already performed on the coordinates. The spin eigenstates of particle A, |J A , m A , in the (x where and where the small-d Wigner matrix contains known functions of β that depend on J, m, m . To achieve the rotation of the originalẑ . This is depicted in Fig. 15, for the case when the quantization axis for the spin of A is its momentum in some other reference frame. Since the third rotation is not necessary, we set γ = 0. 5 The angle θ {A} B is usually called "the A helicity angle", thus to simplify the notation we will denote it as θ A . For compact notation, we will also denote φ {A} B as φ B . These angles can be determined from 6 The helicity couplings H A→B C λ B , λ C are complex constants. Their products from subsequent decays are to be determined by the fit to the data (they represent the decay dynamics). If the decay is strong or electromagnetic, it conserves parity which reduces the number of independent helicity couplings via the relation where P stands for the intrinsic parity of a particle.
After multiplying terms given by Eq. (13) for all decays in the decay sequence, they must be summed up coherently over the helicity states of intermediate particles, and incoherently over the helicity states of the initial and final-state particles. Possible helicity values of B and C particles are constrained by |λ B | ≤ J B , |λ C | ≤ J C and |λ B − λ C | ≤ J A .
When dealing with the subsequent decay of the daughter, B → D E, four-vectors of all particles must be first Lorentz boosted to the rest frame of B, along the p {A} B i.e.ẑ {A} 3 direction (this is the z axis in the rest frame of A after the Euler rotations; we use the subscript "3" for the number of rotations performed on the coordinates, because of the three Euler angles, however, since we use the γ = 0 convention these coordinates are the same as after the first two rotations). This is visualized in Fig. 15, with B → D E particle labels replaced by A → B C labels. This transformation does not change vectors that are perpendicular to the boost direction. The transformed coordinates become the initial  Figure 15: Coordinate axes for the spin quantization of particle A (bottom part), chosen to be the helicity frame of A (ẑ 0 || p A in the rest frame of its mother particle or in the laboratory frame), together with the polar (θ  2 , is aligned with the B momentum; thus the rotated coordinates become the helicity frame of B. If B has a sequential decay, then the same boost-rotation process is repeated to define the helicity frame for its daughters.
coordinate system quantizing the spin of B in its rest frame, The processes of rotation and subsequent boosting can be repeated until the final-state particles are reached. In practice, there are two equivalent ways to determine theẑ Alternatively, we can make use of the fact that B and C are back-to-back in the rest frame of A, p Since the momentum of C is antiparallel to the boost direction from the A to B rest frames, the C momentum in the B rest frame will be different, but it will still be antiparallel to this boost direction To determinex Then we obtainŷ 0 . If C also decays, C → F G, then the coordinates for the quantization of C spin in the C rest frame are defined byẑ i.e. the z axis is reflected compared to the system used for the decay of particle B (it must point in the direction of C momentum in the A rest frame), but the x axis is kept the same, since we chose particle B for the rotation used in Eq. (13).

Matrix element for the Λ * decay chain
We first discuss the part of the matrix element describing conventional Λ 0 b → Λ * n ψ, Λ * n → Kp decays (i.e. Λ * decay chain), where Λ * n denotes various possible excitations of the Λ, e.g. Λ(1520). For simplicity we often refer to Λ * n as Λ * , unless we label an n-dependent quantity. The weak decay Λ 0 b → Λ * n ψ is described by where H Λ 0 b →Λ * n ψ λ Λ * , λ ψ are resonance (i.e. n) dependent helicity couplings to be determined by a fit to the data. There are 4 different complex values of these couplings to be determined for each Λ * n resonance with spin J Λ * n = 1 2 , and 6 values for higher spins. The couplings are complex parameters; thus each independent coupling contributes 2 free parameters (taken to be real and imaginary parts) to the fit. Since the ψ and Λ * are intermediate particles in the decay chain, the matrix element terms for different values of λ ψ and λ Λ * must be added coherently.
The choice of theẑ b spin quantization is arbitrary. We choose the Λ 0 b momentum in the lab frame to define theẑ direction, giving its spin projection onto this axis the meaning of the Λ 0 b helicity (λ Λ 0 b ). In the Λ 0 b rest frame, this direction is defined by the direction of the boost from the lab frame (Eq. (16)), as depicted in Fig. 16. With this choice, θ Λ 0 b is the Λ 0 b helicity angle and can be calculated as Longitudinal polarization of the Λ 0 b via strong production mechanisms is forbidden due to parity conservation in strong interactions, causing λ Λ 0 b = + 1 2 and − 1 2 to be equally likely. Terms with different λ Λ 0 b values must be added incoherently. The choice ofx b rest frame is also arbitrary. We use the Λ 0 b → Λ * ψ decay plane in the lab frame to define it, which makes the φ Λ * angle zero by definition.
The strong decay Λ * n → Kp is described by a term Since the K − meson is spinless, the resonance-dependent helicity coupling H Λ * n →Kp λp depends only on proton helicity, λ p = ± 1 2 . As strong decays conserve parity, the two helicity couplings are related H where P Λ * n is the parity of Λ * n . Since the overall magnitude and phase of H can be absorbed into a redefinition of the H where the values in parentheses give the real and imaginary parts of the couplings.
The angles φ K and θ Λ * are the azimuthal and polar angles of the kaon in the Λ * rest frame (see Fig. 16). Theẑ  17)). This leads to with both vectors in the Λ * rest frame. As explained in Sec. 4.1, thex direction is defined by the choice of coordinates in the Λ 0 b rest frame discussed above. Following Eq. (19) and (24), we have The azimuthal angle of the K − can now be determined in the Λ * rest frame from (Eq. (11)) The term R Λ * n (m Kp ) describes the Λ * n resonance that appears in the invariant mass distribution of the kaon-proton system, . The symbols p 0 and q 0 denote values of these quantities at the resonance peak (m Kp = M Λ * n 0 ). The orbital angular momentum between the Λ * and ψ particles in the Λ 0 b decay is denoted as L . Similarly, L Λ * n is the orbital angular momentum between the p and K − in the Λ * n decay. For the orbital angular momentum barrier factors, p L B L (p, p 0 , d), we use the Blatt-Weisskopf functions [45], to account for the difficulty in creating the orbital angular momentum L, which depends on the momentum of the decay products p (in the rest frame of the decaying particle) and on the size of the decaying particle given by the constant d. We set d = 3.0 GeV −1 ∼0.6 fm. The relativistic Breit-Wigner amplitude is given by where In the case of the Λ(1405) resonance, which peaks below the K − p threshold, we use a two-component width equivalent to the Flatté parameterization. We add to the above width in the K − p channel, a width for its decay to the dominant Σ + π − channel, Γ(m) = Γ(m) K − p +Γ(m) Σπ , where q in the second term and q 0 in both terms are calculated assuming the decay to Σ + π − . Assuming that both channels are dynamically equally likely and differ only by the phase space factors we set Γ 0 to the total width of Λ(1405) in both terms.
Angular momentum conservation limits L Λ * n to J Λ * n ± 1 2 , which is then uniquely defined by parity conservation in the Λ * n decay, P Λ * n = (−1) L Λ * n +1 . Angular momentum conservation also requires max( The electromagnetic decay ψ → µ + µ − is described by a term where ∆λ µ ≡ λ µ + − λ µ − = ±1, and φ µ , θ ψ are the azimuthal and polar angles of µ + for Λ 0 b (µ − for Λ 0 b decays) in the ψ rest frame (see Fig. 16). There are no helicity couplings in Eq. (35), since they are all equal due to conservation of C and P parities. Therefore, this coupling can be set to unity as its magnitude and phase can be absorbed into the other helicity couplings which are left free in the fit. The calculation of the ψ decay angles is analogous to that of the Λ * decay angles described above (Eqs. (28)-(30)) given by Eq. (29). Collecting terms from the subsequent decays together, the matrix element connecting different helicity states of the initial and the final-state particles for the entire Λ * decay chain can be written as Terms with different helicities of the initial and final-state particles (λ p , ∆λ µ ) must be added incoherently where P Λ 0 b is the Λ 0 b polarization, defined as the difference of probabilities for λ Λ 0 b = +1/2 and −1/2 [46]. For our choice of the quantization axis for Λ 0 b spin, no polarization is expected (P Λ 0 b = 0) from parity conservation in strong interactions which dominate Λ 0 b production at LHCb.

Matrix element for the P + c decay chain
We now turn to the discussion of Λ 0 b → P cj K − , P cj → ψp decays, in which we allow more than one pentaquark state, j = 1, 2, . . . . Superscripts containing the P c decay chain name without curly brackets, e.g. φ Pc , will denote quantities belonging to this decay chain and should not be confused with the superscript "{P c }" denoting the P + c rest frame, e.g. φ {Pc} . With only a few exceptions, we omit the Λ * decay chain label.
The weak decay Λ 0 b → P cj K − is described by the term, where H Λ 0 b →Pc j K λ Pc are resonance (i.e. j) dependent helicity couplings. The helicity of the pentaquark state, λ Pc , can take values of ± 1 2 independently of its spin, J Pc j = 1 2 , 3 2 , . . . . Therefore, there are two independent helicity couplings to be determined for each P cj state. The above mentioned φ Pc , θ Pc Λ 0 b symbols refer to the azimuthal and polar angles of P c in the Λ 0 b rest frame (see Fig. 17). Similar to Eq. (25), the Λ 0 b helicity angle in the P c decay chain can be calculated as, The φ Pc angle cannot be set to zero, since we have already defined thex Λ 0 b rest frame by the φ Λ * = 0 convention. According to Eq. (18) The φ Pc angle can be determined in the Λ 0 b rest frame from The strong decay P cj → ψp is described by a term where φ Pc ψ , θ Pc are the azimuthal and polar angles of the ψ in the P c rest frame (see Fig. 17). They are defined analogously to Eqs.
The azimuthal angle of the ψ can now be determined in the P c rest frame (see Fig. 17) We have labeled the ψ and p helicities, λ Pc ψ and λ Pc p , with the P c superscript to make it clear that the spin quantization axes are different than in the Λ * decay chain. Since the ψ is an intermediate particle, this has no consequences after we sum (coherently) over λ Pc ψ = −1, 0, +1. The proton, however, is a final-state particle. Before the P c terms in the matrix element can be added coherently to the Λ * terms, the λ Pc p states must be rotated to λ p states (defined in the Λ * decay chain). The proton helicity axes are different, since the proton comes from a decay of different particles in the two decay sequences, the Λ * and P c . The quantization axes are along the proton direction in the Λ * and the P c rest frames, thus antiparallel to the particles recoiling against the proton: the K − and ψ, respectively. These directions are preserved when boosting to the proton rest frame (see Fig. 18). Thus, Λ where P Pc j is the parity of the P cj state. This relation reduces the number of inde-the muons come from the ψ decay in both decay chains. This makes the polar angle θ µ and adding them coherently to the Λ * matrix element, via appropriate relation of |λ p |λ µ + |λ µ − to |λ Pc p |λ Pc µ + |λ Pc µ − states as discussed above, leads to the final matrix element squared where we set P Λ 0 b = 0. As a cross-check, fitting the Λ 0 b polarization to the data with the default Λ * and P + c model yields a value consistent with zero, P Λ 0 b = (−2.0 ± 2.3)% (statistical error only).
Assuming approximate CP symmetry, the helicity couplings for Λ 0 b and Λ 0 b can be made equal, but the calculation of the angles requires some care, since parity (P ) conservation does not change polar (i.e. helicity) angles, but does change azimuthal angles. Thus, not only must p µ + be used instead of p µ − for Λ 0 b candidates (with K + andp in the final-state) in Eqs. (36), (37), (55), (57) and (58), but also all azimuthal angles must be reflected before entering the matrix element formula: . It is clear from Eq. (67) that various Λ * n and P c resonances interfere in the differential distributions. By integrating the matrix element squared over the entire phase space the interferences cancel in the integrated rates unless the resonances belong to the same decay chain and have the same quantum numbers. 8

Reduction of the number of helicity couplings
A possible reduction of the helicity couplings can be achieved by relating them to the LS couplings (B L,S ) using Clebsch-Gordan coefficients and then restricting the L values. Here L is the orbital angular momentum in the decay, and S is the total spin of the daughters, S = J B + J C (|J B − J C | ≤ S ≤ J B + J C ). If the energy release in the decay, then higher values of L should be suppressed; this effect is usually called "the angular momentum barrier." Applying this approach to Λ 0 b → ψΛ * n decays, the lowest L value (L min ) corresponds to a single possible value of S, thus reducing the number of couplings to fit, from 4 (J Λ * n = 1 2 ) or 6 (J Λ * n ≥ 3 2 ), to just one B L,S coupling per resonance. Accepting also L min + 1 values, gives three B L,S couplings to fit per resonance.
In Λ 0 b → P cj K − decays, S = J Pc j and L The reduction of couplings to fit for P cj → ψp decays depends on the spin and parity of the P cj state. S can take values of 1 2 and 3 2 . Values of L Pc j must be odd (even) for even (odd) P Pc j . For a J P Pc j = 1 2 + state, only L Pc j = 1 is allowed with the two possible values of S. Therefore, no reduction of couplings is possible. For a J P Pc j = 1 2 − state, L Pc j = 0, 2 are allowed, each corresponding to one S value. Therefore, the number of couplings to fit can be reduced from 2 to 1 when taking L Pc j = 0. Gains can be larger for J Pc j ≥ 3 2 states. Even if no reduction in parameters is achieved, expressing the helicity couplings via corresponding B L,S couplings using Eq. (68) is useful, since it automatically implements the parity constraints (Eq. (52)) by restricting possible L values. Since the overall magnitude of the matrix element does not affect the normalized signal PDF, and because its overall phase also drops out when taking its modulus, we fix the magnitude and phase convention by setting B

Details of fitting techniques
where Φ(m Kp ) is the phase space function equal to p q, where p is the momentum of the Kp system (i.e. Λ * ) in the Λ 0 b rest frame, and q is the momentum of K − in the Λ * rest frame, and I( − → ω ) is the normalization integral.
We use two fit algorithms that were independently coded and that differ in the approach used for background subtraction. The sFit procedure uses the sPlot technique described in Ref. [31] to subtract background from the log-likelihood sum. It has been used in previous LHCb analyses, e.g. measurement of φ s in B 0 s → J/ψ φ decays [48]. The data in the entire 5480 − 5760 MeV range are passed to the fitter. Candidates are assigned "sWeights" (W i ) according to their m J/ψ pK value with the signal and background probabilities determined by the fit to the m J/ψ pK distribution, where s W ≡ i W i / i W i 2 is a constant factor accounting for the effect of the background subtraction on the statistical uncertainty. Candidates in the sideband region have negative sWeights which on average compensate for the background events present in the peak region, where events have positive sWeights. From signal simulations, we see significant variations of Λ 0 b → J/ψ pK − invariant mass resolution as functions of cos θ Λ 0 b and cos θ J/ψ . The background distributions also vary in intensity and shape with these two variables. No strong variation is seen for the other fitting observables. To determine the sWeights, the events are divided into 4 | cos θ J/ψ | × 8 cos θ Λ 0 b bins. A separate fit to the Λ 0 b → J/ψ pK − invariant mass distribution of each bin is performed.
In the sFit approach the total PDF is equal to the signal PDF, as the background is subtracted from the log-likelihood sum using sWeights, The last term does not depend on the fitted parameters − → ω and is therefore dropped. The efficiency still appears in the normalization integral. The integration is done without the need to parameterize the efficiency, by summing the matrix element squared over the simulated events that are generated uniformly in phase space and passed through the detector modeling and the data selection procedure, where w MC j are the weights given to the simulated events to improve the agreement between data and simulations. They include particle identification weights obtained from calibration samples of Λ → pπ − and D 0 → K − π + as functions of momentum and pseudorapidity of the protons and kaons. They also include a weight correcting the overall efficiency for the dependence on the charged track multiplicity of events, determined from the ratio of the background-subtracted data and the signal simulations. Imperfect description of the Λ 0 b production kinematics in our simulation is corrected in a similar way via a weight that depends on the p and p T of the Λ 0 b baryon. The final weights are the ratio of the data and the simulations as a function of proton and kaon momenta.
In the cFit approach, candidates are not weighted (W i = 1). The data that are fitted are restricted to be within a ±2σ mass window around the Λ 0 b mass peak, in the interval 5605.7 < m J/ψ pK < 5635.8 MeV. The background is represented explicitly in the fitted PDF, with the integrated background probability set to β = 5.4% as determined from the fit to the J/ψ K − p mass distribution, P(m Kp , Ω| − → ω ) = (1 − β) P sig (m Kp , Ω| − → ω ) + β P bkg (m Kp , Ω).
The 6-dimensional background parameterization P bkg (m Kp , Ω) is developed using the background sample in which the Λ 0 b candidate invariant mass is more than 5σ away from the peak, specifically within the intervals 5480.0 − 5583.2 MeV and 5658.3 − 5760.0 MeV. We minimize the negative log-likelihood defined as, where N is the number of candidates, and P u bkg (m Kp , Ω) is the unnormalized background density proportional to the density of sideband candidates, with its normalization determined by 9 .
The background term is then efficiency-corrected so it can be added to the efficiencyindependent signal probability expressed by |M| 2 . This way the efficiency parametrization, (m Kp , Ω), influences only the background component which affects only a tiny part of the total PDF, while the efficiency corrections to the signal part enter Eq. (72). The efficiency in the background term is assumed to factorize as (m Kp , Ω) = 1 (m Kp , cos θ * Λ ) 2 (cos θ Λ 0 b |m Kp ) 3 (cos θ J/ψ |m Kp ) 4 (φ K |m Kp ) 5 (φ µ |m Kp ).
(76) The 1 (m Kp , cos θ * Λ ) term is obtained from a 2D histogram of the simulated events weighted by 1/(p q). A bi-cubic interpolation is used to interpolate between bin centers. The other terms are again built from 2D histograms, but with each bin divided by the number of simulated events in the corresponding m Kp slice to remove the leading dependence on this mass, which is already taken into account in the first term. The leading variation of the efficiency is in the 1 (m Kp , cos θ * Λ ) term which is visualized in the normal Dalitz variables (m 2 J/ψ p , m 2 Kp ) in Fig. 19(a) instead of the "rectangular Dalitz plane" variables (m Kp , cos θ * Λ ) used to parameterize this variation.