Testing Hadronic-Model Predictions of Depth of Maximum of Air-Shower Profiles and Ground-Particle Signals using Hybrid Data of the Pierre Auger Observatory

,


I. INTRODUCTION
The dominant contribution to uncertainties in the determination of the mass composition of ultrahigh-energy cosmic rays (UHECR, energy E > 10 18.0 eV) comes from the modeling of extensive air showers.Modern hadronic interaction models used for this purpose are based on extrapolation of interaction parameters like cross sections, multiplicities, elasticities, etc., measured at accelerators at lower beam energies up to ffiffi ffi s p ¼ 13 TeV for proton-proton (pp) collisions at the LHC and pseudorapidities jηj ≲ 5 compared to energies ffiffi ffi s p ≳ 50 TeV and pseudorapidities1 η ≈ 7 to 11 driving the energy flow of the first interactions of UHECR in the atmosphere, where the target is different (mostly oxygen and nitrogen nuclei).Therefore, improvements in the description of the LHC data, implemented in the modern models do not necessarily lead to unambiguous, nearly hadronic model-independent predictions for the mass-sensitive air-shower observables.For instance, at 10 18.7 eV the span in predictions for the mean depth of maximum of air shower profiles, hX max i, between models used by the UHECR community (Epos-LHC [1], QGSJet-II-04 [2], SIBYLL 2.3d [3]) is ∼25 g=cm2 , nearly independently of the primary particle mass and energy.Such a difference can be considered only a lower limit on the systematic uncertainty of the predicted X max scale.This is about one-quarter of the difference between hX max i values of the two astrophysical extremes, protons and iron nuclei.As a consequence, the mass composition of cosmic rays can be referred only with respect to X max scale predicted by a particular model.The largest differences in the predicted hX max i between the models, with a minimal impact on the elongation rate, come from the properties of the first hadronic interaction and production of nucleonsantinucleons in pion-air and kaon-air interactions that are not well-known in the relevant kinematic region (for more detailed discussion, see e.g., Ref. [4]).The general comparison of properties and different treatments of hadronic interactions by the three models used in this work is summarized in Ref. [5].Note that although the models used in this work treat differently the properties of hadronic interactions, the range of air-shower properties predicted by these models does not need to include all the possibilities.The difference in the standard deviation of X max distributions, σðX max Þ, between the models is within ∼5 g=cm 2 , whereas the difference between the X max fluctuations of protons and iron nuclei is ∼40 g=cm 2 .Therefore, the difference of σðX max Þ in model predictions has a smaller effect on the mass composition inferences compared to the difference in predictions of hX max i. Note, that there is no direct correspondence of the model scales of X max and σðX max Þ, i.e., the differences in σðX max Þ are not a mere consequence of the differences in hX max i.
In general, the signal produced by air-shower particles reaching the ground shows much lower sensitivity to the mass composition than in the case of X max .The model differences in predictions of the ground-particle signal, for instance, at 1000 m from the impact point of an air shower at 10 18.7 eV detected at the Pierre Auger Observatory (Auger) [6] are at the level of ∼3 VEM, 2 whereas the difference between protons and iron nuclei is about twice this value.The fluctuations of the ground signal are dominated by the detector resolution, suppressing any significant sensitivity to the model or primary mass [7].

A. Problems in the description of data with hadronic interaction models
The correctness of the model predictions can be tested using data from air-shower experiments.At the Pierre Auger Observatory, for instance, negative variance of the logarithm of primary masses [σ 2 ðln AÞ] and poor description of the measured X max distributions are obtained when simulations with QGSJet-II-04 are used for the interpretation of the measured X max moments in terms of the moments of ln A for E ≳ 10 18.5 eV [8][9][10][11].The problem is that, due to relatively shallow hX max i predictions of QGSJet-II-04, the best possible description of the data is achieved with protonhelium mixes, but for these mixes the modeled X max distributions are broader than the observed ones.These findings are further supported by the observation of the negative correlation between X max and the signal in surface detector (SD) [11,12] in the Auger data near the "ankle" feature (E ≈ 10 18.7 eV) of the UHECR spectrum.That can be achieved only for the mixed primary composition containing particles heavier than helium nuclei, which is a robust statement based on the general phenomenology of air-shower development and, therefore, not sensitive to the properties of a particular hadronic model.
The deficiencies of the models are more evident for the SD observables, where, in some cases, the data are not even bracketed by the Monte Carlo (MC) predictions for protons and iron nuclei.The muon deficit in simulations, known as the "muon puzzle" [5], is the best-known example of this kind.In particular, this was observed by Auger for inclined showers (zenith angles θ ¼ 62°to 80°) dominated by the muon component, for Epos-LHC and QGSJet-II-04 at ∼10 19.0 eV [13], and in direct measurements with underground muon detectors at E ¼ 10 17.3 eV to 10 18.3 eV and θ < 45° [14].At the same time, the fluctuations of the muon signal measured by Auger [15] are consistent with the MC predictions, indicating that the muon deficit might originate from the accumulation of small deviations from the model predictions during the development of a shower, rather than be caused by a strong deviation in the first interaction.The range of predictions for the muon production depth of protons and iron nuclei is outside of the measured values for Epos-LHC above 10 19.3 eV [16].
Less directly, the muon deficit in simulations was observed as a deficit of the total signal at 1000 m from the shower core, Sð1000Þ, in the Auger SD stations [17] for vertical (θ < 60°) showers with energies around 10 19 eV.In this analysis, within the assumption that the electromagnetic (em) component, X max , and, correspondingly, the mass composition inferred from the X max measurements, are predicted correctly by a particular model, the deficit of Sð1000Þ was interpreted as the underestimation of the hadronic signal (dominated by muons) in simulations by ð33 AE 16Þ% for Epos-LHC and by ð61 AE 21Þ% for QGSJet-II-04, with a strong dependence on the energy scale.An indicative summary of all these tests for the three models used in this work is given in Table I.
The use of an incorrect MC X max scale would lead one to a biased inference on the mass composition and, through this, to a biased estimate of the muon deficit, since the muon content in a shower scales with the primary mass ∝ A 1−β , β ≈ 0.9 [18].Therefore, in a more comprehensive approach a modification of the MC scales of both X max and SD signals, going along with the fitting of the primary mass composition accounting for these modifications, should be considered.

B. Progressive testing of hadronic interaction models
In this work, progressive testing of the model predictions is performed.First, we allow for a rescaling of the signal on the ground produced by the hadronic shower component at 1000 m with a factor R had .Then we add a zenith-angle dependence of R had defining two parameters R had ðθ min Þ and R had ðθ max Þ at the two extreme zenith-angle bins.We assume a linear dependence of R had on the distance of X max to the ground in atmospheric depth units to relate R had to different zenith angles.Finally, we consider a shift in the predicted X max distributions (ΔX max ) that is assumed to be independent of the primary mass and energy.In this way, we consider freedom not only in the scale of the simulated hadronic part of the ground signal but also in the simulated X max scale.Consequently, the main differences in X max and Sð1000Þ predictions of the models are reduced, and similar mass composition inferences for the Auger data are obtained.It is remarkable, that a consistent description of the Auger data with all three models can be achieved with the modification of only two scale parameters.
The analysis is performed for the energy region around the ankle (E ≈ 10 18.7 eV) in the energy spectrum where the UHECR mass composition is mixed [9,11,12].Specifically, we find the values of ΔX max , R had ðθÞ, and fractions of four primary particles (protons, and helium, oxygen, iron nuclei) for which the best-fit of the measured two-dimensional distributions of (Sð1000Þ, X max ) is achieved.The remaining differences between the predictions of models with a smaller effect on MC templates like the fluctuations of X max and hadronic signal, and the mass composition dependence of R had ðθÞ and ΔX max are not considered in this work.
In Sec.II, we give a detailed description of the method.In Sec.III, the method is applied to the data of the Pierre Auger Observatory.The results of the data analysis are discussed in Sec.IV followed by a summary of our findings in the Conclusions.

II. METHOD
The method stems from Ref. [19] and it was first introduced in Ref. [20] followed by a slight modification in the approach to the ground-signal rescaling.A preliminary application of the method to the Auger data was presented in Ref. [21].
We perform a binned maximum-likelihood fit of the measured two-dimensional distributions (S, X) with MC templates for four primary species simultaneously in five zenith-angle bins.The observables are Sð1000Þ and X max corrected for the energy evolution using the fluorescence detector (FD) energy E FD , and where B ¼ 1.031 AE 0.004 is the SD energy calibration parameter [22] and the elongation rate of a single primary D ¼ 58 g=cm 2 /decade is taken as the average value over the four primary particles and three models used in this work.The value of the elongation rate varies only within AE2 g=cm 2 around the mean, in accordance with the universality with respect to the primary mass predicted within the phenomenological model in Ref. [23].This is a consequence of energy dependencies of multiplicity, elasticity, and cross section assumed in this phenomenological model.We chose the reference energy E ref ¼ 10 18.7 eV for the analyzed energy range of E FD ¼ 10 18.5 eV to 10 19.0 eV.The signal Sð1000Þ is assumed to be composed of the hadronic (S had ) and electromagnetic (S em ) components.The signal S had is produced by muons, em particles from muon decays and low-energy neutral pions as in [17] according to the four-component shower universality model [24,25].The signal S em is produced by em particles originating from high-energy neutral pions.
The detector simulations and event reconstruction were performed with the Auger Offline Framework [33].In the standard event processing chain, not all effects related to the detector calibration, atmospheric conditions, long-term performances, etc., are included in the simulations and reconstruction.To account for them, for the FD part an additional smearing of X max distributions by ≈9 g=cm 2 is applied [9].In the case of the SD part, the smearing of Sð1000Þ by 9%, corresponding to the maximal expected contribution from realistic operational conditions, was tested without finding any statistically significant effect on the results.The event selection is the same as applied to the data (see Sec. III).After the selection, the MC templates contain ≈15000 showers per primary species and model.
The analysis was carried out by splitting these simulated air-showers in five zenith-angle ranges containing nearly the same number of events, namely ð0°; 33°Þ, ð33°; 39°Þ, ð39°; 45°Þ, ð45°; 51°Þ, ð51°; 60°Þ.Examples of the (S, X) distributions for protons and iron nuclei in the most vertical and most inclined angular ranges are shown in Fig. 1.Such two-dimensional distributions are then normalized and FIG. 1. Examples of two-dimensional distributions of S and X for protons (left) and iron nuclei (right) generated with Epos-LHC for zenith angles between 0°and 33°(top) and 51°and 60°(bottom).The red points indicate the mean values of S. E ¼ 10 18.5 eV to 10 19.0 eV.fitted with the ansatz function Φ described in detail in Appendix A. This function is a convolution of the generalized Gumbel distribution of X and the Gaussian distribution of S with the mean value linearly changing with X, reflecting in this way their correlation indicated in Fig. 1.A set of these trial functions for each model, primary particle, and zenith-angle range is used as MC templates in the following fitting procedure.

B. Fitting procedure
For each model, we search for the most likely combination of the composition mix of the four primary species, the zenith-dependent rescaling parameter of the hadronic signal R had ðθÞ, and the constant shift of the depth of shower maximum ΔX max in all the MC templates.The fitting method is a generalization of the fitting procedure used in Ref. [10] in the case of the X max distribution and applied here to the two-dimensional (X, S) distributions in five zenith-angle ranges simultaneously.
The negative log-likelihood-ratio expression that is minimized for a given model is of the form, with the sums running over the two-dimensional bins j in (X, S) for the five θ-bins k.The corresponding number of showers measured in bins j, k is denoted by n jk and the predicted number of MC showers by C jk .The latter number is obtained using the total number of measured showers N k data as where Φ i;k denotes the template function Φ in θ-bin k for a given model and primary particle i with relative fraction f i .
The modified X prediction is of the form and the rescaled predicted ground signal is where X j;k and S j;k are the center bin values of X and S, respectively, of the original MC distribution (X, S).
The rescaling parameter f k SD of all signals S j;k is calculated as where Sð1000Þ ∝ E B FD is assumed to be composed only of S had ∝ E β FD and S em ∝ E FD , and f had ¼ S had =Sð1000Þ.The parameter β ¼ 0.92 is chosen following Ref.[23].The mean energy factors 2. The total ground signal at 1000 m from the shower core (black) and its hadronic (brown) and em (turquoise) components as a function of the distance from X max to the ground in atmospheric depth units for protons (left) and iron nuclei (right) for different models.The bands contain the statistical uncertainty.E ¼ 10 18.5 eV to 10 19.0 eV, θ < 60°.from all measured showers in the energy range 10 18.5 eV to 10 19.0 eV and θ-bin k. 3 The mean hadronic fraction, see also Fig. 3, in the θ-bin k, f had;k , is calculated using the average hadronic fractions f had;k;i for simulated showers induced by a primary i and weighted over the relative primary fractions f i as The average effect of the X max change on the ground signal is incorporated through the separate effects on the em g em;k and hadronic g had;k signals, see Appendix B for details.We parametrized the evolution of the mean ground signal parts with the distance of X max to the ground in atmospheric depth units, X atm − X max , where X atm ¼ 880 g=cm 2 =cos θ, see the examples in Fig. 2. In this way, the total ground signal is estimated to be modified (via g em;k , g had;k ) at most by about 7% for a change of X max by 50 g=cm 2 .This is in accordance with the functional dependencies in Fig. 2 weighted over the relative contributions of hadronic and em signals (see Fig. 3) and over the primary fractions.
As a consequence of the four-component shower universality approach, the em signal is very similar in all three models, see Fig. 2. The differences in the total signal stem from the size of the hadronic signal at different zenith angles, corresponding to different X atm − X max values.Therefore, the freedom in R had ðθÞ and also ΔX max (see Sec. I) removes the main differences in predictions of S and X for the three models.We assumed a linear dependence of R had ðθÞ on X atm − X max and defined rescaling parameters of the hadronic signal at two extreme zenith angles, R had ðθ min Þ and R had ðθ max Þ, for ∼28°and ∼55°, respectively (see Appendix B for the definition).
We have verified using MC-MC tests, see Appendix C, that the method is performing well and the observed biases were taken as sources of systematic uncertainty of the results.

III. DATA ANALYSIS A. Data selection
The analysis is applied to hybrid events, i.e., events detected with both SD and FD of the Pierre Auger Observatory between 1 January 2004 and 31 December 2018.To select high-quality events, for the FD part we apply the selection criteria used for the X max analysis [9,11], for the SD part the selection is the same as in the measurements of the SD energy spectrum [22], additionally removing events with saturated SD stations.In this way, we ensure an accurate estimation of the observables X max , E FD , θ, and Sð1000Þ used as the inputs to the method.The selection efficiency is similar for all four primary masses and all zenith-angle bins, therefore it does not introduce mass-composition biases.In total, 2239 hybrid events were selected in the FD energy range 10 18.5 eV to 10 19.0 eV (hE FD i ≈ 10 18.7 eV) and zenith angles between 0°and 60°.The data were divided into five zenith-angle bins containing nearly the same (N ¼ 425-500) and sufficiently large number of events, see Fig. 4.
To take into account long-term performance of the FD and SD [34], we applied a time-dependent correction to E FD with a negligible systematic effect on the final results.The signal Sð1000Þ was corrected for seasonal atmospheric effects [35].

B. Results
The minimized values of the log-likelihood expression [see Eq. ( 3)] are summarized in Table II, progressively applying different modifications to the models.At all stages, the fits of the mass composition are performed with (p, He, O, Fe) fractions as the free-fit parameters.First, the model predictions without any modifications are used for fitting the data.Then we perform fits adding freedom in only one of the modifications ΔX max , R had (independent on zenith angle), R had ðθÞ (zenith-angle dependent).Finally, the fits are performed using combinations of (R had , ΔX max ) and (R had ðθÞ, ΔX max ).From the minimum values of the log-likelihood expression for these scenarios, and from the likelihood-ratio test for the nested model, one can see that the major improvements in the data description are achieved due R had and then ΔX max modifications.FIG. 3. The average fraction of hadronic signal at 1000 m from the shower core as a function of the reconstructed zenith angle for different models and primary masses.E ¼ 10 18.5 eV to 10 19.0 eV.
The improvement in the data description due to the introduction of the zenith-angle dependence in R had is statistically significant for QGSJet-II-04, and less significant for SIBYLL 2.3d, with a negligible effect in the case of Epos-LHC.
The measured two-dimensional (S, X) distributions are described acceptably by MC templates modified by R had ðθÞ and ΔX max in all five zenith-angle bins with p-values estimated to be higher than 10% for all three models.These probabilities are obtained from MC-MC tests (see Appendix C) using distributions of values of log-likelihood expression from the fitting of 500 random samples consisting of 2239 simulated showers.Each MC sample had a mass composition and artificially modified X max and S had ðθÞ following the values obtained from the best fits to data.We plot in Fig. 5 projected distributions of X and S (together with plots in the logarithmic scale to stress the consistency also for the tails of distributions) and their zenith-angle dependent correlation using the Gideon-Hollister correlation coefficient [38] for the data and MC templates with the modifications of (R had ðθÞ, ΔX max ).In this way, we demonstrate that with the modified MC templates, we achieve a consistent simultaneous description of X max , Sð1000Þ and their correlation for the data shown in Fig. 4. The models without modifications or only with the zenith-angle independent modification of R had do not describe the data equivalently well (see Appendix D).
The resulting parameters of the data fits with (R had ðθÞ, ΔX max ) modifications are presented in Table III.For all three models, a deeper X max scale is favored (see also left panels in Figs. 6 and 7) that would result in heavier primary mass composition derived from the X max data compared to the inferences with nonmodified predictions of the models [10,39].The modifications ΔX max are such that they reduce the difference in X max scales between the models (see Fig. 10 in Sec.IV) and, as a consequence, similar estimations of the fractions of the primary nuclei can be inferred from the data, see the right panel of Fig. 6.
Due to the shift of the X max predictions deeper in the atmosphere, more shower particles reach the ground producing a few percent larger SD signals compared to the nonmodified models.Therefore, the total increase of S for the modified MC predictions consists of contributions from both ΔX max (∼0 to 7%) and R had ðθÞ (∼10 to 16%) as shown in the left panel of Fig. 8.The two effects from the modification of the X max scale-heavier primary mass composition and a larger number of shower particles reaching the ground-lead to the values of R had ðθÞ (Table III) that are smaller than the values previously found for Epos-LHC and QGSJet-II-04 in Ref. [17].The modified and original S had scales are shown for the three models in the right panel of Fig. 8.

C. Systematic uncertainties
There are four dominant sources of systematic uncertainties in the fitted parameters: (i) The uncertainty in the FD energy scale AE14% [22]; (ii) The uncertainty in the X max measurements   2), for the Auger data (points) and for the best fits of (S, X) distributions with R had ðθÞ and ΔX max modifications in predictions of three models (denoted by † as templates were modified from original predictions).In the top-right panel, the dependence of the Gideon-Hollister coefficient of correlation [38] between S and X on the zenith angle is shown.The χ 2 probabilities characterize the compatibility between measurements and MC predictions in the individual plots.
FIG. 6. Left: Correlations between ΔX max and R had ðθ max ≈ 55°Þ modifications of the model predictions obtained from the data fits.The contours correspond to 1σ, 3σ, and 5σ statistical uncertainties.The gray rectangles are the projections of the total systematic uncertainties.Right: The most likely primary fractions of the four components from the data fits using ΔX max and R had ðθÞ.The height of the gray bands shows the size of projected total systematic uncertainties.
(iv) The biases of the method estimated from the MC-MC tests (see Appendix B for the results of these tests).Since the X max systematic uncertainty is strongly correlated with the modification ΔX max , its effect on the nuclei fractions is nearly cancelled out by the corresponding change of ΔX max .In general, the nuclei fractions, and therefore the inferences on the mass composition, are weakly sensitive to all experimental systematic errors due to the simultaneous fitting of ΔX max and R had in the method.To explore the effect of those systematics, and as a simplifying ansatz, the total systematic uncertainties on the fit parameters are obtained by summing all four contributions in quadrature (see Appendix E for the size of individual contributions).
One can also note, that within systematic errors no significant dependence of R had on the zenith angle was found.The difference between R had ðθ min Þ and R had ðθ max Þ shows a tight correlation with the uncertainty on the energy scale.However, given all the experimental uncertainties in the case of QGSJet-II-04, the measured data prefers within the method rather flatter attenuation of the hadronic signal at 1000 m than predicted by the model, indicating too hard spectra of muons predicted by this model.
The systematic uncertainties on the parameters B, D, β, used in Eqs. ( 1) and (2) for the energy correction of S, X, as well as corrections of the long-term performance and other effects related to the operation of the SD and FD, have a negligible contribution to the systematic uncertainties.We could not identify any significant dependencies of the results on the zenith angle or energy in the studied ranges.

D. Significance of improvement in data description
In Fig. 9, the results of our method for ΔX max and R had ðθÞ applying also all possible combinations of the systematic uncertainties on E FD , X max , and Sð1000Þ are shown with the full points.These points are located approximately in a plane, contour outlined with a dashed line, due to a correlation between the modification parameters through the mass composition describing the data (see the left panels of Figs. 6 and 7, e.g., increase of ΔX max leads to a heavier fitted mass composition and consequently to a decrease of R had ).The plane is tilted with respect to the [R had ðθ min Þ, R had ðθ max Þ] plane.It is a consequence of the effect of ΔX max on the ground signal S at different zenith angles, see Eqs. (B1)-(B5) in Appendix B, and consequently on the fitted R had ðθ min Þ, R had ðθ max Þ.The color of FIG. 7. Left: Correlations between ΔX max and R had ðθ min ≈ 28°Þ modifications of the model predictions obtained from the data fits.Right: Correlation between R had ðθ max ≈ 55°Þ and R had ðθ min ≈ 28°Þ.The contours correspond to 1σ, 3σ, and 5σ statistical uncertainties.The gray rectangles are the projections of the total systematic uncertainties.TABLE III.Modifications of the model predictions and primary fractions in the energy range 10 18.5 eV to 10 19.0 eV with statistical and systematic uncertainties for the best data fits and the p-values obtained using MC-MC tests.the points corresponds to the difference in fitted loglikelihood expressions (Δ ln L) in case of no modifications and in case of the assumed template modifications.The closest approach to the point of no modification is estimated through a dense scan of linear combinations of experimental systematic uncertainties for the lowest values of Δ ln L that are, in some cases, even beyond the range of systematic uncertainties quoted by Auger (see Appendix F for more details).For all three models, the closest approach of Δ ln L (indicated by a line in Fig. 9 connecting point [1; 1; 0 g=cm 2 ] with the plane) is >19 which is still higher than the value estimated using the Wilks' theorem in the likelihood-ratio test for nested model at the level of 5σ (Δ ln L ≈ 16.62).This confirms that the modifications of X max and R had scales are not an artifact of the systematic uncertainties of our measurements but a needed change in FIG. 8. Left: The total rescaling of S (dashed lines) broken down into the contributions from the rescaling of the hadronic signal R had ðθÞ (solid lines) and the change of the predicted X max scale (ΔX max ) (dash-dotted lines) for different models.Right: The dependence of the hadronic signal at 1000 m on the distance of X max to the ground in atmospheric depth units as predicted for proton showers generated using nonmodified models (dashed lines) and accounting for the R had ðθÞ, ΔX max modifications (solid lines).The height of the gray bands shows the size of projected total systematic uncertainties.FIG. 9. Values of the modification parameters (points) for all possible combinations of experimental systematic uncertainties on the energy (AE14%), X max ð þ8 −9 g=cm 2 Þ, and Sð1000Þ (AE5%).The color of the points shows the difference in log-likelihood expressions (Δ ln L) in the case of no modifications and in the case of the assumed template modifications, including the differences higher than 50 (note the slightly different scale between models).The results (see Table III) for no systematic shift of the data are highlighted by stars.Dashed lines outline the contour of the plane from the best-fit to the points.The closest approach to the nonmodified (R had ðθÞ ¼ 1, ΔX max ¼ 0 g=cm 2 ) model predictions using a dense scan of linear combinations of experimental systematic uncertainties is connected with this point by a black line.The animated rotated views are available at [40]. the model descriptions.A correction of the results for the biases seen in the MC-MC tests leads to even larger significance values.

A. Implications for inferences on mass composition
One straightforward consequence of the X max shift deeper in the atmosphere is the solution of the problem with the negative variance of the logarithmic mass σ 2 ðln AÞ derived with QGSJet-II-04 from the measured hX max i and σðX max Þ (as discussed in Sec.I).After application of the corresponding ΔX max shifts, one finds σ 2 ðln AÞ ≈ 0.5 to 2.5 in the energy range 10 18.5 eV to 10 19.0 eV for all models used in this work.These values are consistent with the degree of mixing of the primary composition found in the analysis of the correlation between X max and Sð1000Þ with nonmodified models, since the correlation analysis relies on the general phenomenology of air showers and this way is weakly sensitive to the uncertainties in the description of hadronic interactions [11,12].
Another outcome of the method can be foreseen by taking into account the quasiuniversal behavior of the X max elongation rate for all pure beams and models with values staying within 54 g=cm 2 =decade to 61 g=cm 2 =decade range.Changing the elongation rate within these limits introduces an energy-dependent uncertainty on the MC X max scale of about 4 g=cm 2 =decade at most.Under the assumption that the difference ΔX max between the models and data remains nearly independent of the primary energy, i.e., that there is no new physics that can significantly change the predictions for the X max elongation rate of single primary species, one could speculate that at the highest energies (E ≳ 10 19.5 eV) the Auger X max measurements, see Fig. 10, can be described with a heavy mass composition having a low degree of mixing due to hX max i and σðX max Þ staying between the extrapolated predictions of the modified models for oxygen and iron nuclei.

B. Primary species in the cosmic-ray beam
We checked if the shape of the data distribution of ground signal and X max and its zenith-angle evolution can be fitted better with an artificially reduced range of the masses in the MC templates and thus with different values of R had ðθÞ and ΔX max .Due to the presence of deep events and σðX max Þ values close to the predictions for protons in 10 18.5 eV to 10 19.0 eV range, we kept protons in the MC templates and used (p, He, O) and (p, He) mixes for the data fits.In both cases, the quality of data fits was found to be inferior compared to the fits with (p, He, O, Fe) nuclei (see Table IV).This was confirmed by the MC-MC tests and was observed using the fits to the measured data for all three models that the obtained X max scale decreases by about 5 g=cm 2 to 7 g=cm 2 , 10 g=cm 2 to 17 g=cm 2 , and 30 g=cm 2 to 40 g=cm 2 and the hadronic signal scale by about 2% to 5%, 4% to 9%, and 15% to 20% when the heaviest primary Fe is replaced by Si, O, and He in the fit, respectively.
For the five-component (p, He, O, Si, Fe) fits, the values of R had ðθÞ and ΔX max remain well within statistical errors from the values obtained from the (p, He, O, Fe) fits.Though the fractions of silicon and iron nuclei are strongly anticorrelated in such fits, the silicon fraction remains low, <5%.
FIG. 10.The energy evolution of the mean (left) and the standard deviation (right) of the X max distributions measured by the Pierre Auger Observatory using FD [11] (solid circles) and SD [41] (open squares).The results of this paper for the modified X max scales (left panel) are shown with shaded bands with the heights corresponding to the systematic uncertainties.The original nonmodified model predictions for different primary species are shown with lines for the entire energy range.

C. Modifications of hadronic interactions
Assuming the same modifications of hadronic interaction features as in Ref. [42], the observed increase of the X max scale in MC predictions could be explained by an increase of the elasticity or a decrease of the multiplicity and cross section.A detailed study of the combinations of such modifications at the energy equivalent to this study is ongoing [43].
The elasticity is a very good candidate for a potential source of X max modification because there are no precise data to constrain this in models at high energy.It could be precisely measured only in the region where no detector at the LHC experiments exists.Consequently, the different models have relatively different predictions with large uncertainties.
A reduction of the multiplicity can also increase X max since the available energy is shared between fewer particles leading to higher π 0 energy at the first interaction.But at the same time, the same effect will reduce the energy available for muon production effectively increasing the tension in R had .
The superposition model [44] makes an ad hoc modification of p-p cross section that would explain a single change of X max scale for all primaries rather difficult.The change in the case of iron nuclei would correspond to a change of p-air cross section at the energy 56 times smaller, ∼9 × 10 16 eV, which starts to be in tension with the corresponding LHC measurements of p-p collisions [45].Remarkably, the most recent cross section measurements by the ALFA experiment are lower and more precise [46] than the one used to tune the models, possibly indicating an overestimation of the current cross section of the air-shower simulations.
In the case of the multiplicity and elasticity, we can consider the model of shower development from Ref. [23] also assuming the energy increase of the total multiplicity as N ∝ N 0 E α and a decrease of the elasticity with energy κ ∝ κ 0 E −ω .We see that an ad hoc change of the normalization of multiplicity (N 0 ) or elasticity (κ 0 ) would modify the X max independently of the primary particle and energy, where ξ π c is the critical energy of pions at which their decay and interaction lengths are equal.
The hadronic signal can be increased by increasing the multiplicity or decreasing the ratio of neutral pions according to Ref. [42] but, as previously described, an increase in the multiplicity would decrease X max increasing the tension there.A change in the ratios of different hadrons, in particular with more strange particles, is a more likely possible explanation according to recent LHC data [47,48].
The energy spectrum of muons naturally influences the dependence of S had on X atm − X max .Though the systematic uncertainty of the energy scale limits a significant conclusion deduced from the method [see R had ðθ min Þ − R had ðθ max Þ in Appendix E], the QGSJet-II-04 model seems to predict too hard spectra of muons.For instance, a larger fraction of strange particles like kaons in the shower would lead to harder muon spectra because of the larger critical energy (smaller lifetime) of strange particles.
There are of course other possible modifications of hadronic interactions that could influence the observed differences between the predictions from the models and the measured data like cross sections of low-mass mesons, energy spectra of pions etc., (see e.g., Ref. [5]).

D. Limitations of the method
There are remaining differences between the models and, correspondingly, with the data due to various limitations of the method.Though our approach leads to a reduction of the differences between models in hX max i, the obtained modifications ΔX max do not cancel out these differences completely.In particular, the modified X max scale for Epos-LHC is shallower compared to QGSJet-II-04 and SIBYLL 2.3d (see the left panel of Fig. 10).We found, that a large part of this difference (∼10 g=cm 2 ) can be removed if an additional smearing of X max is applied to Epos-LHC showers to compensate for the smaller X max fluctuations predicted by this model in comparison to the other two (possibly partly due to strong defragmentation of nuclei in Epos-LHC [49], as recently confirmed by newer model EPOS-LHCR [50]).The remaining difference of about 5 g=cm 2 between the models might be due to the statistical errors on ΔX max and differences between the models in the separation of the primary species in hX max i.
Possible dependencies of ΔX max , R had ðθÞ, and fluctuations of X max and Sð1000Þ on the primary mass are out of the scope of this paper.We checked that adding a linear mass dependence of ΔX max into the method did not improve the fit significantly.The assumption of a mass-independent ΔX max used in this paper was mainly motivated by the similar differences in predicted hX max i for different primaries [50].Such a quasiuniversal difference is a consequence of the very similar energy dependencies of interaction features like multiplicity, elasticity, and cross section assumed in the models [51].It means that an offset in these features would lead to an approximately massindependent difference in the X max predictions.An ad hoc modification of the energy dependence of these features, like in [42,43], would lead to mass-dependent X max shift.

V. CONCLUSIONS
In this paper, we tested the predictions of the models QGSJet-II-04, SIBYLL 2.3d, Epos-LHC regarding the depths of maximum of air-shower profiles X max and the signal produced by air-shower particles in water-Cherenkov stations at 1000 m from the shower core, Sð1000Þ, composed of electromagnetic and hadronic (S had ) parts.The test consisted of fitting with MC templates of twodimensional distributions of (Sð1000Þ, X max ) measured at the Pierre Auger Observatory and obtaining the scales of X max and S had ðθÞ predicted by the models, as well as the of four primary nuclei (p, He, O, Fe).We found that for the best description of the data distributions in the energy range 10 18.5 eV to 10 19.0 eV for θ < 60°the MC predictions of X max should be deeper in the atmosphere by about 20 g=cm 2 to 50 g=cm 2 , and the hadronic signal should be increased by about 15% to 25%.These modifications reduce the differences between the models in X max and Sð1000Þ, and as a consequence, lead to smaller uncertainties on the estimated fractions of the primary nuclei.Due to the deeper MC X max scale and, correspondingly, a heavier mass composition inferred from the data compared with nonmodified models, the scaling factors for the hadronic signal are found to be smaller than in previous estimations not considering any modifications to the MC X max scales.The statistical significance of the improvement in the data description using the assumed modifications to the MC templates is above 5σ for all three models even accounting for all possible linear combinations of experimental systematic uncertainties.The difference in R had ðθÞ at the two extreme zenith angles implies an indication that softer spectra of muons generated by QGSJet- II-04 in 1000 m from the shower core would describe the Auger data better.
The specific ways to produce the required changes in the models might consist in combinations of modifications of integral (cross section, multiplicity, elasticity, etc.) or differential (secondary particles energy spectra) characteristics of the hadronic interactions as discussed e.g., in Refs.[5,42,43].Our method addresses only the first-order differences in the mean values of X max and hadronic signal without taking into account their possible dependencies on the primary mass or energy.These dependencies, as well as the investigation of a modification of fluctuations of air-shower observables, need to be studied further to corroborate the scale adjustments found in this paper, supplemented also by future data with increased masssensitivity from AugerPrime [52], the upgrade of the Pierre Auger Observatory.

Appendix A: PARAMETRIZATION OF MC TEMPLATES
The MC templates of the two-dimensional distributions of (S, X) normalized by the total number of simulated showers N MC and weighted to correspond to the measured energy spectrum are fitted using the following function: The X max part is described by the generalized Gumbel distribution [53], where x ¼ ðX − mÞ=s and the ground-signal part is assumed to follow the Gaussian distribution with the mean value linearly dependent on X where y ¼ S − pX − q.The normalization terms are given by where Γ is the gamma function.The six free parameters in each of the MC template fits are m, s, λ of the generalized Gumbel distribution, and p, q, r of the Gaussian part.These fitted parameters for the three models used in this work are listed in Tables V-VII.In Fig. 11, we show the mean and the standard deviation of the pull distribution for the description of each bin of the two-dimensional distribution (S, X), see examples in Fig. 1, by the fits with a function Eq. (A1).The goodness of the description of the MC templates with these fits is also tested with the two-dimensional Kolmogorov-Smirnov test [54] demonstrating very good consistency between the MC templates and MC distributions of (S, X).

Appendix B: PARAMETRIZATION OF ATTENUATION OF GROUND SIGNALS
The electromagnetic and hadronic signals at 1000 m were corrected for the energy evolution as The dependence of these average signals on the distance of X max to the ground in atmospheric depth units, t ¼ X atm ðθÞ − X max , see Fig. 2, was parametrized with the Gaisser-Hillas function [55] allowing its vertical offset, where α ¼ had or em, and normalization scaling ZðxÞ ¼ ðu α − xÞ=v α .t α is the value of X atm ðθÞ − X max where the function reaches its maximum, u α and v α are parameters without a straightforward physics interpretation, and S 0 α , w α are the rescale and offset parameters, respectively.The fitted parameters are listed in Tables VIII and IX for em and hadronic signal, respectively.The factor g em;k reflecting the average change of em signal due to the change of X max scale for a mix of primary species i with relative fractions f i in zenith-angle bin k is calculated as where hti k is the average measured X atm ðθÞ − X max in a zenith-angle bin k: ∼241.2 g=cm 2 , ∼335.5 g=cm 2 , ∼429.4 g=cm 2 , ∼556.1 g=cm 2 , ∼777.6 g=cm 2 , for respective increasing average values of the zenith angle.The factor g had for the hadronic signal is obtained as

þ8− 9 FIG. 4 .
FIG.4.Distributions of ground signal at 1000 m (S) and depth of shower maximum (X), see Eqs. (1) and (2), for the data of the Pierre Auger Observatory in the energy range 10 18.5 eV to 10 19.0 eV in five zenith-angle bins.

whereFIG. 11 . 19 FIG. 12 . 20 FIG. 13 .
FIG. 11.The mean and the standard deviation of the MC distribution of PULL¼ ðN fit − N mc Þ= ffiffiffiffiffiffiffiffi N mc pwhere N fit is the value of the fitted function Eq. (A1) and N mc is the value for each bin of the two-dimensional MC distribution (S, X).The probability of consistency between the parametrized function and MC distribution (red) is tested with the two-dimensional Kolmogorov-Smirnov (K-S) test.E ¼ 10 18.5 eV to 10 19.0 eV.

TABLE I .
Indicative summary of the results of tests of models using Auger data (✓-no tension, ✗-tension).In the case of SIBYLL 2.3d, we also show estimations based on the previous version of the model Sibyll when available in the literature.

TABLE II .
[37]ution of the minimum values of the log-likelihood expression [see Eq. (3)] fitting the data with different modifications of the model predictions.In all cases, except R had ¼ const and ΔX max , the significance of improvement of data description with the R had ðθÞ and ΔX max fit is above 5σ using the likelihood-ratio test applying the Wilks' theorem[36]for nested model[37].In the case of R had ¼ const and ΔX max , the improvement of data description with R had ðθÞ and ΔX max fit is ∼0.1σ, ∼4.4σ, and ∼2.0σ for Epos-LHC, QGSJet-II-04, and SIBYLL 2.3d, respectively.
FIG. 5. Distributions of depth of shower maximum (X, top-left) and ground signal at 1000 m (S, bottom), see Eqs. (1) and (

TABLE IV .
Minimum values of the log-likelihood expression [see Eq. (3)] for different number of primaries assumed to be present in the model predictions.In all cases, the significance of improvement of data description with the R had ðθÞ, ΔX max , and p, He, O, Fe fit using the likelihood-ratio test applying the Wilks' theorem for nested model is above 5σ.

TABLE V .
Parameters of MC templates for air showers generated with model Epos-LHC and initiated by a primary particle i, see Eq. (A1).

TABLE VI .
Same as in Table V, but for model QGSJet-II-04.

TABLE VII .
Same as in Table V, but for model SIBYLL 2.3d.