Radio Measurements of the Depth of Air-Shower Maximum at the Pierre Auger Observatory

The Auger Engineering Radio Array (AERA), part of the Pierre Auger Observatory, is currently the largest array of radio antenna stations deployed for the detection of cosmic rays, spanning an area of $17$ km$^2$ with 153 radio stations. It detects the radio emission of extensive air showers produced by cosmic rays in the $30-80$ MHz band. Here, we report the AERA measurements of the depth of the shower maximum ($X_\text{max}$), a probe for mass composition, at cosmic-ray energies between $10^{17.5}$ to $10^{18.8}$ eV, which show agreement with earlier measurements with the fluorescence technique at the Pierre Auger Observatory. We show advancements in the method for radio $X_\text{max}$ reconstruction by comparison to dedicated sets of CORSIKA/CoREAS air-shower simulations, including steps of reconstruction-bias identification and correction, which is of particular importance for irregular or sparse radio arrays. Using the largest set of radio air-shower measurements to date, we show the radio $X_\text{max}$ resolution as a function of energy, reaching a resolution better than $15$ g cm$^{-2}$ at the highest energies, demonstrating that radio $X_\text{max}$ measurements are competitive with the established high-precision fluorescence technique. In addition, we developed a procedure for performing an extensive data-driven study of systematic uncertainties, including the effects of acceptance bias, reconstruction bias, and the investigation of possible residual biases. These results have been cross-checked with air showers measured independently with both the radio and fluorescence techniques, a setup unique to the Pierre Auger Observatory.

: Università di Catania, Dipartimento di Fisica e Astronomia "Ettore Majorana", Catania, Italy : Università di Milano, Dipartimento di Fisica, Milano, Italy : Università di Napoli "Federico II", Dipartimento di Fisica "Ettore Pancini", Napoli, Italy : Università di Palermo, Dipartimento di Fisica e Chimica "E.Segrè", Palermo, Italy : Università di Roma "Tor Vergata", Dipartimento di Fisica, Roma, Italy : Università Torino, Dipartimento di Fisica, Torino, Italy : Benemérita Universidad Autónoma de Puebla, Puebla, México : Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas del Instituto Politécnico Nacional (UPIITA-IPN), México, D.F., México : Universidad Autónoma de Chiapas, Tuxtla Gutiérrez, Chiapas, México : Universidad Michoacana de San Nicolás de Hidalgo, Morelia, Michoacán, México : Universidad Nacional Autónoma de México, México, D.F., México : Universidad Nacional de San Agustin de Arequipa, Facultad de Ciencias Naturales y Formales, Arequipa, Peru : Institute of Nuclear Physics PAN, Krakow, Poland : University of Lódź, Faculty of High-Energy Astrophysics, Lódź, Poland : Laboratório de Instrumentação e Física Experimental de Partículas -LIP and Instituto Superior Técnico -IST, Universidade de Lisboa -UL, Lisboa, Portugal : "Horia Hulubei" National Institute for Physics and Nuclear Engineering, Bucharest-Magurele, Romania : Institute of Space Science, Bucharest-Magurele, Romania : Center for Astrophysics and Cosmology (CAC), University of Nova Gorica, Nova Gorica, Slovenia : Experimental Particle Physics Department, J. Stefan Institute, Ljubljana, Slovenia : Universidad de Granada and C.A.F.P.E., Granada, Spain : Instituto Galego de Física de Altas Enerxías (IGFAE), Universidade de Santiago de Compostela, Santiago de Compostela, Spain : IMAPP, Radboud University Nijmegen, Nijmegen, The Netherlands : Nationaal Instituut voor Kernfysica en Hoge Energie Fysica (NIKHEF), Science Park, Amsterdam, The Netherlands : Stichting Astronomisch Onderzoek in Nederland (ASTRON), Dwingeloo, The Netherlands : Universiteit van Amsterdam, Faculty of Science, Amsterdam, The Netherlands : Case Western Reserve University, Cleveland, OH, USA : Colorado School of Mines, Golden, CO, USA : Department of Physics and Astronomy, Lehman College, City University of New York, Bronx, NY, USA : Michigan Technological University, Houghton, MI, USA : New York University, New York, NY, USA : University of Chicago, Enrico Fermi Institute, Chicago, IL, USA : University of Delaware, Department of Physics and Astronomy, Bartol Research Institute, Newark, DE, USA : University of Wisconsin-Madison, Department of Physics and WIPAC, Madison, WI, USA : -a : Louisiana State University, Baton Rouge, LA, USA b : Institut universitaire de France (IUF), France c : also at University of Bucharest, Physics Department, Bucharest, Romania d : School of Physics and Astronomy, University of Leeds, Leeds, United Kingdom e : now at Agenzia Spaziale Italiana (ASI).Via del Politecnico 00133, Roma, Italy f : Fermi National Accelerator Laboratory, Fermilab, Batavia, IL, USA g : now at Graduate School of Science, Osaka Metropolitan University, Osaka, Japan h : now at ECAP, Erlangen, Germany i : Max-Planck-Institut für Radioastronomie, Bonn, Germany j : also at Kapteyn Institute, University of Groningen, Groningen, The Netherlands k : Colorado State University, Fort Collins, CO, USA l : Pennsylvania State University, University Park, PA, USA (Dated: November 1, 2023) The Auger Engineering Radio Array (AERA), part of the Pierre Auger Observatory, is currently the largest array of radio antenna stations deployed for the detection of cosmic rays, spanning an area of 17 km 2 with 153 radio stations.It detects the radio emission of extensive air showers produced by cosmic rays in the 30 − 80 MHz band.Here, we report the AERA measurements of the depth of the shower maximum (Xmax), a probe for mass composition, at cosmic-ray energies between 10 17.5 to 10 18.8 eV, which show agreement with earlier measurements with the fluorescence technique at the Pierre Auger Observatory.We show advancements in the method for radio Xmax reconstruction by comparison to dedicated sets of CORSIKA/CoREAS air-shower simulations, including steps of reconstruction-bias identification and correction, which is of particular importance for irregular or sparse radio arrays.Using the largest set of radio air-shower measurements to date, we show the radio Xmax resolution as a function of energy, reaching a resolution better than 15 g cm −2 at the highest energies, demonstrating that radio Xmax measurements are competitive with the established high-precision fluorescence technique.In addition, we developed a procedure for performing an extensive data-driven study of systematic uncertainties, including the effects of acceptance bias, reconstruction bias, and the investigation of possible residual biases.These results have been crosschecked with air showers measured independently with both the radio and fluorescence techniques, a setup unique to the Pierre Auger Observatory.Example of a footprint of the radio emission on the ground for a simulated air shower with an energy of 8.2 • 10 17 eV, a zenith angle of 50.2 • , and a depth of the shower maximum of 749 g cm −2 .The strength of the emission is evaluated at simulated antenna positions (markers) and interpolated in between for visibility (background).The footprint has been projected into the shower plane, i.e., tilted into the plane perpendicular to the shower axis ⃗ v and rotated to project the magnetic field ⃗ B along the x-axis.
The measurement of ultra-high-energy cosmic rays (UHECRs) relies on the detection of the products of extensive air showers that are initiated when cosmic rays impact Earth's atmosphere.The study of these air showers allows one to extract their properties and thereby reconstruct important observables, such as the arrival direction of the cosmic-ray primary, its energy, and its particle type.Knowing the particle type is key to understanding the nature and origin of cosmic rays.This is of particular interest in the energy range between 10 17 to 10 19 eV where the cosmic-ray flux is expected to transition from having Galactic to extragalactic sources (see for example the review [1].In the transition region a change in mean mass of the primaries, their mass composition, FIG. 2. Schematic view of three air showers that started at different heights in the atmosphere, and their radio emission footprints on the ground.It illustrates that the depth of the shower maximum affects the radio emission footprint both in width and general shape.The asymmetry is a consequence of how the geomagnetic and charge excess radio emission mechanisms interfere during the shower development.This figure has been previously published in [15].could help disentangle source contributions. The past decades have seen major improvements to the detection of extensive air showers and the reconstruction of air-shower parameters.Though typically this has up until now been the domain of direct particle detection and the observation of air-Cherenkov or fluorescence light, the last two decades also saw the detection of radio emission from air showers coming to maturity [2][3][4][5][6][7][8].For reviews on the recent progress see e.g.[9,10].This is important as the radio technique has the advantage of a near-100% duty cycle and relatively low-cost hardware, while still performing precision measurements of the electromagnetic part of the shower.In the extensive air shower, changing currents, caused by charged particles moving under the influence of the Earth's magnetic field (geomagnetic emission) and by the ionization of the surrounding atmospheric medium (charge excess emission), lead to electromagnetic radiation, predominantly in the MHz to GHz frequency band.Using an array of radio antennas on the ground, the radio emission footprint can then be measured.An example of a simulated radio footprint on the ground is shown in Fig. 1.The radio footprint shape depends strongly on particle type and can thus be used to probe the cosmic-ray mass composition.A heavier primary nucleus (which acts roughly as a superposition of multiple lower-energy nucleons) will, on average, interact higher up in the atmosphere and hence produce a wider footprint on the ground than a lighter primary particle.This is illustrated in Fig. 2. The particle type itself is not a direct observable, but the atmospheric depth where the shower is maximally developed, the depth of the shower maximum X max , which depends on the particle type, can be related to the shower footprint shape, making X max a probe for the primary particle type.
Several methods have been used over the past years to reconstruct the particle type from radio signals, most of those relying on determining either the slope, width, or full shape of the lateral distribution function (LDF) of the radio footprint to determine X max [11][12][13][14][15].In addition, also other methods using for example the slope of the frequency spectrum [16,17] and shape of the shower wavefront have been attempted [18].Out of all these methods, the highest resolution in X max has been thus far achieved by using the LDF of the radio footprint by fitting of simulated air showers to measured air showers [19,20].
In this work, the simulation-fitting method has been further developed for the Auger Engineering Radio Array (AERA), by accounting for the effects of the sparse (compared to other radio experiments) and irregular array of radio stations.Also, a thorough investigation of systematic uncertainties has been made.We present the details of the X max reconstruction method and quantify the resolution as a function of cosmic-ray energy.Next, we apply the method to the set of air showers measured by AERA to determine the distributions of X max and interpret this in terms of the cosmic-ray mass composition.We then compare the composition to the results of the fluorescence detector (FD) at the Pierre Auger Observatory.Furthermore, we use a subset of air showers, simultaneously measured and independently reconstructed with both AERA and FD to directly evaluate our method and place bounds on the total systematic uncertainty between the two X max detection techniques.This paper will start with a description of the AERA air-shower reconstruction and the selection of showers in Sec.II.Then, the X max reconstruction method will be described in Sec.III.In Sec.IV we make an inventory of systematic uncertainties on the reconstruction of the X max distribution of the selected air showers.The resolution with which X max is reconstructed is then shown in Sec.V. Finally, the resulting X max distribution as measured by AERA will be presented in Sec.VI.  3. Layout of the AERA stations (triangles), marked with whether they are externally-or self-triggered.Also shown is part of the SD array of water-Cherenkov detectors (circles) and the field of view (FOV) of one of the FD telescope sites located near AERA.The FD contains 6 regular telescope bays (light green) and in addition 3 bays (dark green) looking at higher elevations.Scale and orientation of the layout are indicated with markers.
In an accompanying publication [21] these results are discussed in the context of the larger field of other measurement techniques and experiments.

II. AERA DATA RECONSTRUCTION
The Pierre Auger Observatory [22] is located near the town of Malargüe in Argentina.It aims at detecting UHECRs up to the highest energies.The observatory covers an area of 3000 km 2 , making it the largest of its kind in the world.The main components of the observatory are an array of 1660 water-Cherenkov detectors (WCDs), also called the surface detector (SD), and 27 fluorescence telescopes (known as the fluorescence detector, FD) that overlook the SD.Located near one of the FD sites and within the SD grid is also an array of radio detectors (AERA) [23].This radio array consists of 153 autonomous stations, each with two orthogonally placed dipole antennas, that measure the spectrum between 30 and 80 MHz, sampling the signal roughly every 5 ns.The measured voltage signals in the antenna arms are converted to an electric field ⃗ E(t) from which we calculate the integrated signal per unit area, conventionally called the energy fluence u [eV/m 2 ].Part of the measured energy fluence will be from the background noise that will need to be subtracted.One can assume that before the cosmic-ray signal arrives, or long after the cosmicray pulse has passed, that the electric field time trace also represents the noise during the time of the signal.

Number of events
High-quality showers Reconstructed showers Bias-free shower sample FIG. 4. Distributions of shower energy (left), and the azimuth and zenith angles of the shower arrival direction (center and right, respectively) for the pre-selection of 2153 high-quality AERA showers (blue), the showers for which Xmax was reconstructed successfully with our method (gray), and the sample of showers after acceptance and reconstruction cuts are applied (green).The cosmic-ray energy spectrum as measured by Auger SD [24] (gray dashed line) is scaled to the energy distribution of AERA to illustrate the level of completeness of the AERA event set at the higher energies.
The energy fluence can then be calculated as the integral over the time period [t 1 , t N ] containing the signal, minus the contribution of a pure background time interval [t b,1 , t b,M ], where N and M are the respective numbers of samples for the bin size ∆t: (1) where ϵ 0 is the vacuum permittivity and c the speed of light.
The spacing between radio stations varies between 144 m and 750 m (see Fig. 3) and the array spans a total area of 17 km 2 .While the radio stations can be triggered on the radio signals themselves, in this work we make use of just the external trigger provided by the SD such that we also directly have the measurement of shower energy at our disposal.Consequently, the dark red part of the radio array, shown in the upper right part in Fig. 3, is a subset of detectors that operate in a self-triggering mode and consequently are not used in this analysis.
The water-Cherenkov detectors are spaced on a triangular grid of 750 m that overlaps with the AERA station grid.Because the radio station spacing is typically much smaller, the estimation of the shower core position and arrival direction is made with the information of cosmicray signals in the AERA stations.
From the 7 years of AERA measurements (2013/04 -2019/11), we select air-shower candidate events that were triggered by SD, which meet the requirement of a certain quality in terms of clustering of triggered SD stations [39, p. 79].Additionally, we select on the events where at least 5 AERA stations have measured a signal with a signal-to-noise ratio above 10 (the signal is defined here as the square of the maximum of the Hilbert envelope of the electric field and the noise as the square of TABLE I.The number of air-shower events remaining after applying the selection criteria sequentially.η shows the fraction of showers remaining after each of the cuts.The three sections correspond to the three sets of events in Fig. 4.

Quality cut criteria
Events the RMS of the electric field in a time window away from the signal), a lower limit set by the requirements for the X max reconstruction in Sec.III.As part of these criteria, an algorithm was implemented to reject stations from the shower reconstruction in case of hardware failures or excessive radio-frequency interference (RFI) background signals, which is monitored every 100 seconds.We also limit the data set to showers arriving from within 55 • of the zenith.Firstly, because the reconstruction at higher inclinations is currently an active field of study [26,27], and secondly, because sensitivity to X max decreases for higher inclinations since the emission region will be more distant.We require both the SD and AERA arrival direction reconstruction to find angles below 55 • and also require agreement between the arrival direction within 10 • and core position within 400 m.This acts only as a rejection of outlier values due to bad reconstructions; the arrival direction and core position reconstruction are much better than this (about 1 • and 50 m, respectively, for SD and similar or smaller for AERA) [28,29].Furthermore, periods of enhanced atmospheric electric field conditions, such as occur during times of thunderstorms, are removed from the data set.To be conservative, events are also rejected if no electric field information was available (accounting for half of the events that are rejected in this step).This results in a pre-selected set of 2153 showers in the energy range of 10 17 to 10 19 eV, the lower limit being set by the detection threshold above the radio background level and the upper limit being exposure-limited.
In Table I we list these cut criteria and the number of events after each cut.In Fig. 4 we show the distribution of these air showers as a function of the shower energy (left), and the azimuth and zenith angles of the arrival direction (center and right, respectively).Indicated in blue is the pre-selected set of showers as described above.
The gray and green elements in the figure refer to further quality cuts in the reconstruction of X max (gray) and selection of a bias-free sample (green) as will be described in Sec.IV.To illustrate the completeness of the data set, at least at higher energies, the cosmic-ray flux as measured by the Pierre Auger Observatory [24] has been superimposed and re-scaled to the AERA shower distribution.
Note that the radio signal strength depends on the Lorentz force F ∼ ⃗ v × ⃗ B, and thus on the angle between the arrival direction of the shower ⃗ v with respect to the Earth's magnetic field ⃗ B, hence the increased suppression of the detected showers as the azimuth angle approached (approximately) 90 • and the arrival direction becomes more aligned with the magnetic field.
There is a small overlap in the effective field of view of AERA and FD, such that for a subset of 53 showers in the set of selected showers for AERA also an independent high-quality FD shower reconstruction is available.The number is mainly limited by the distance and different energy dependent apertures of AERA and FD, and the FD duty cycle.We will use these 53 showers in Sec.IV for an independent check on the X max reconstruction on an event-by-event basis.

III. RECONSTRUCTION METHOD FOR Xmax
The method to reconstruct the depth of the shower maximum that we use in this work is based on the method developed for LOFAR [19,20] where a set of Monte-Carlo (MC) air-shower simulations is generated based on the basic reconstructed properties of a measured air shower such as cosmic-ray energy and arrival direction.The depth of the shower maximum X max , is affected by shower-to-shower fluctuations and thus similarly varies for each of the simulations.The sensitivity of the radio signals to X max is then used to match the radio signals between measurement and simulations to reconstruct the X max value of the measured air shower.We use the airshower simulation code CORSIKA v7.7100 [30] with radio extension CoREAS [6] and QGSJetII-04 [31] as our high-energy hadronic interaction model.We include several higher-order effects to simulate the individual measured air showers as precisely as possible.We include the Global Data Assimilation System (GDAS) atmospheric model [32,33] for our time and location-dependent air density and refractive index modelling.A time-variable geomagnetic field model [34] is also used because the majority of the emission being driven by the magnetic field, which changed slightly over the several years of AERA data used in this work.We model the simulated stations to lie on the sloped plane of AERA and add several concentric rings of 'virtual' stations such that we can interpolate the energy fluences with higher precision between AERA station positions.This is done because the core position of the shower is only known to the order of 20 m and, when comparing the simulated and measured radio signals, we shift the simulated footprints to correct for the offset caused by this uncertainty.
As input for the shower simulations we use the shower energy from SD.All showers in the data set are triggered by the SD and hence the SD energy measurement is available for each event.For the shower core position and arrival direction, we use the reconstruction from AERA.The stochastic nature of particle interactions in the air shower leads to shower-to-shower fluctuation in X max such that this parameter can be described by a Gumbel distribution [35].We create an ensemble of 27 simulations for each of the 2153 selected air showers: 15 induced by protons and 12 induced by iron nuclei (intermediate-mass particles are not used as it has been shown to not be necessary [36]).We use more proton showers since these cover a larger range in X max .These primaries and quantities are selected to cover the true distribution of X max ), including the tails of the X max distributions, by varying the initial seeds and height of the first interaction of the primary cosmic ray, while keeping all other input parameters identical.In this way, when comparing the simulated and measured radio signals, we can determine the X max value which best describes the measured signals of the air shower.
For each shower we quantify the quality of the match between the measured and the simulated radio signals by defining a chi-squared quantity based on the energy fluence u and the corresponding uncertainty σ u of the radio stations: The simulated energy fluences u simulated are calculated by applying the AERA antenna response to the pure simulated signals and then reconstructing them as if they were actual measured signals [37] (no noise is added to the simulated signals since we would have to remove it again to calculate the energy fluence, as in Eq. 1, needlessly reducing precision.The uncertainty on X max reconstruction due to noise is accounted for later in this section).In the chi-squared measure, we account for the possible systematic uncertainties from the air-shower simulations and the uncertainty on the reconstruction of the shower energy by introducing a scaling parameter S between measured and simulated energy fluences.We also account for the uncertainty on the reconstruction of the shower core position with a core shifting parameter ⃗ r shift .Suitable starting values for the core shift are taken from either an initial fitting procedure [15] or a barycenter calculation.Both free parameters are shared between all simulations for the event under consideration (because a measured shower and its corresponding simulations have just a single core offset and energy scaling between them).
The chi-squared values for each of the shower simulations as a function of the true MC X max values of those shower simulations can be fitted (locally) with a parabola function as is illustrated in Fig. 5 such that the X max value at the minimum of the parabola fit X parabola max acts as an estimator for the X max value of the measured shower.The minimum is found by an iterative procedure where the free parameter space of S and ⃗ r shift is searched for a global minimum in χ 2 .Checks are built into the procedure such that the minimum is in fact a  Second-order bias correction ∆X KDE max,2 and total uncertainty δX KDE max,2 after including the effects of a free core and free energy scaling in the minimization procedure.Once again the same event is used as in Fig. 5 and Fig. 6.
global minimum (using a basin-hopping minimizer [38] and an additional coarse full-parameter space search), and that the parabola fit is well-behaved.We test the validity of this procedure by evaluating this with the reconstruction of each of the simulated showers under realistic ambient noise conditions (from periodic noise mea-surements with our stations), by leaving out that specific simulation and then minimizing χ 2 using the other 26 simulations belonging to that particular air-shower event.
The minimization then provides S and ⃗ r shift parameters for evaluation.We reconstruct S = 1 within an offset of (0.9 ± 0.4)% and a spread of (23.4 ± 0.1)%.The bias is negligibly small and the uncertainty is primarily driven by the propagation of the uncertainty on the radio signal itself.The free parameter for the core shift we determine to have a minor bias of (0.4 ± 0.2) m and the spread is found to be (20.6 ± 0.2) m, which is on the same order as the core position resolution of AERA.Hence, the minimization algorithm used to determine the best-fit between measured (or the simulated ones mimicking real measurements) and simulated showers does not introduce any additional biases in the free parameters.
The resolution and possible bias of the parabola-X max reconstruction procedure is evaluated by reconstructing the X parabola max values of each of the 27 simulated showers, that we have for each measured shower, and comparing these reconstructions to their true Monte Carlo values X MC max .An example of this procedure is shown in Fig. 6.It shows the difference between the X parabola max and X MC max values for each of the simulations (points) as a function of X parabola max .Note that simulations with a bad χ 2 probability for the parabola fit are shown as rejected (crosses, see Fig. 7) and simulations that failed to reconstruct are not shown (the resulting effect on the detector sensitivity to X max is quantified in Sec.IV).The spread along the horizontal axis is not necessarily a constant value for any X parabola max and, in addition, there can be a bias that depends on X parabola max itself.The main reason for this is that the constraining power on X max is determined by the amount and quality of radio signals for a particular air shower and these quantities change with X max .In addition, the parabola fit in the estimation of X parabola max will be more difficult to make when the chi-squared minimum is near the edges of the range of X MC max values.As a consequence, the very low X max values will often be overestimated and the very high values often underestimated.Because of this inherent bias in this estimator, we implement steps to mitigate this.We model the spread and bias of the difference in X parabola max versus X MC max by determining the kernel density estimator (KDE) for the simulated points (colored background in Fig. 6).A KDE is a method to estimate a smooth probability density distribution based on substituting discrete points by smooth functions (Gaussian kernels).We extract from this the mean and 1σ spread at any desired X parabola max value (illustrated with regularly spaced black bars).A shift from zero on the horizontal axis then indicates the bias as a function of X parabola max and the spread of the points provides the uncertainty of the reconstruction.Note that for the spread we have taken into account that the bandwidth of the KDE broadens the spread and we have corrected for this such that the uncertainty on X max that we determine is truly a 1σ error with respect to the spread in MC X max values.
We perform this procedure in two steps to disentangle, in a more stable way, the effects of the intrinsic uncertainties of our X max reconstruction method and the uncertainties that can arise from the uncertainties on the core position and shower energy (the two free parameters in [Eq.( 2)]) that are inherent to just the measured air showers.
In the first step we fix the shower core position and energy scaling parameters to the true Monte-Carlo values such that we can calculate the KDE (Fig. 6) to identify and correct for any bias in the X max estimation caused by the parabola X max estimation itself.This then provides an improved, first-order bias-corrected, estimator for X max In the second step, the X max reconstruction is repeated, but now performed including the 2 free parameters.In this way, we can separately identify and correct for any X max -reconstruction bias originating from the uncertainties on the measured core position and shower energy that were used as input parameters for the COR-SIKA simulations.For this we look at the X KDE max,1 estimator (i.e., after the first KDE-correction step) for each reconstructed simulation, compare this to the true MC values as before, model it again with a KDE, and as before extract bias and uncertainty estimators.The secondorder bias correction is then given by By also applying this correction, our final AERA X max estimator is obtained.The spread in the reconstructed X max values in Fig. 7 provides an estimation of the uncertainty on the X max reconstruction, accounting now for the effects of the full reconstruction procedure as if it was executed on a measured air shower.The spread is extracted from the 1σ region around the bias estimator value in the KDE model (i.e., the region between the 15.87% and 84.13% quantiles).For the remainder of this work the estimators for X max and its uncertainty will be called X max and δ Xmax , respectively (for the latter δ Xmax is used instead of σ Xmax to avoid confusion with the second moment of the X max distribution, σ(X max ), which will be introduced in Sec.V).For both of the steps of the procedure qualitychecks have been built into the procedure to guarantee the bias and uncertainty estimators represent the underlying data correctly.In situations where this was not the case, primarily for showers with lower quality signals, events have been rejected because of having an ill-defined bias and uncertainty (see the 'Valid X max uncertainty and bias estimation' cut in Table I.These quality criteria are described in more detail in [39, p. 163]).
In the end, this procedure provides an end-to-end estimation of X max uncertainty and bias of the method.However, while the bias correction reduces bias, it can't fully correct it.For example, at the edges of the simulated X max range the KDE is sparsely populated and, hence, there the method only has a partial ability to correct for biases.One could mitigate this further by doing more simulations, but here we were computationally constrained to 27 simulations per shower.We account for any remaining bias of the reconstruction as systematic uncertainty in Sec.IV.

IV. ACCEPTANCE CUTS AND SYSTEMATIC UNCERTAINTIES
To interpret the distribution of X max we first implement an acceptance cut such that our set of showers is not biased by selection effects.We first apply a cut in energy at E = 10 17.5 eV, above which the SD trigger we use to read out AERA is fully efficient [24,25].However, not every trigger leads to a high-quality shower in AERA.Hence, we next calculate the detection acceptance for AERA by evaluating the reconstructability of the set of 27 simulated air showers that were created for each measured shower.We implement the condition that the measured shower should have been detected if it had arrived anywhere within the expected range of X max values as predicted by simulations.Specifically, we require, for any shower we select, that 90% of the X max values of a Gumbel distribution for both protons and iron nuclei, given the energy of the shower, would be reconstructable by AERA.Removing the events that do not pass the acceptance cut results in 594 showers.Table I lists these quality cut steps and the final distribution of events can be seen in Fig. 4 (green shaded area).Fig. 8, as example, shows the average acceptance (thick green line) for all selected showers with energies between 10 17.95 to 10 18.10 eV and the average Gumbel distributions for the energies of those showers under the assumption of a composition consisting of just protons (solid red), just iron nuclei (solid blue), and the mixed-mass composition as measured by Auger FD [40] (Gumbel parametrization for QGSJetII-04 [41,42] are used).At these energies AERA is fully efficient up to about 850 g cm −2 , after which the efficiency drops slightly for the tail of the proton Gumbel distribution.For the lowest energies this occurs around 800 g cm −2 (not shown).
Although the acceptance is only reduced for extreme X max values we investigate the systematic uncertainty on the mean and width of our measured X max distribution that this would cause.For this, we calculate the effect of the acceptance curve on the Gumbel distributions (dashed lines in Fig. 8), which are shown to be modified by less than a few g cm −2 compared to the solid lines.The resulting differences in the two moments of the distributions are shown as insets in the figure.This calculation is performed for all energy bins and results in and the systematic effect it has on the mean and width of the Gumbel Xmax distributions (annotated values in g cm −2 ) for a pure proton mass composition, pure iron mass composition, and the mixed-mass composition as measured by Auger FD [40] (solid red, blue, and black lines).The dashed lines are the distributions convolved with the acceptance.The lines are plotted at 50% opacity since they match closely.Vertical lines show the respective means.
a bias of under 4 g cm −2 and 5 g cm −2 on the mean and width, respectively, when assuming the least favourable composition conditions (see blue bars in Fig. 9).The calculation is included in Appendix A 1.
Next, we also evaluate the bias the X max reconstruction of individual showers has on the X max distribution.While Sec.III implemented steps to remove X max bias, this is not guaranteed to be sufficient, especially for the deepest and shallowest showers, as explained in that section.Hence, the overall effect on the selected set of air showers is evaluated by reconstructing X max for the air-shower simulations, for which the true MC X max is known, and calculating the effect the reconstruction would have in the case nature would give us a Gumbel X max distribution for protons, iron nuclei, or a 50:50 mix of the two.For the mean of the X max distribution the proton and iron nuclei cases would represent the two extreme cases, since bias occurs mostly for the deepest and shallowest showers.The width of the distribution would be most affected by a mix of proton and iron, hence we evaluate also the case of a 50:50 mix.Fig. 10 shows, as example, the effect on the distribution for the showers in the energy range of 10 17.95 to 10 18.10 eV.The bias in the width and mean of the distributions is taken as systematic uncertainty, again, to be conservative, under the assumption of the composition with the largest bias.We show the results of this as a function of energy in Fig. 9 (red bars) and show the calculation in Appendix A 2.
We furthermore account for systematic uncertainty of the use of the GDAS atmospheric model [19,43]   choice of hadronic interaction model in the CORSIKA simulation code [19] (these LOFAR result are also valid for AERA due to the similarities of the implementation of GDAS and X max reconstruction methods).An additional systematic uncertainty on the width and mean of the X max distribution at a certain energy arises from the systematic uncertainty in the energy scale [44].These effects, shown in Fig. 9, are all relatively small compared to the uncertainty from the X max reconstruction itself.Finally, we investigate any possible residual bias remaining in the data set.We check the mean of the X max distribution as a function of geometry-sensitive parameters such as the shower core position and the arrival direction, which by themselves should not cause any trends in the mean X max if there is no residual bias.Because the number of showers for some energy bins is rather limited we combine all showers regardless of energy and correct for the trend in energy.We define Y max as the depth of the shower maximum where the elongation rate, the natural increase of the average X max as a function of energy, and change in X max from composition changes with energy has been corrected by subtracting the mean X max of the showers in the respective energy bin ⟨X ∆E max ⟩ and normalized to the all-data mean ⟨X max ⟩: This now normalizes all values to roughly the average energy of the set of AERA showers and any residual trends in Y max with geometry parameters can be investigated on the full set of data.Fig. 11 shows the effect as a function of the cosine of the shower zenith angle θ.We bin the Y max data in equallysized bins and fit a line to the mean values of the bins.The resulting linear trend (solid line) is shown to be compatible with zero slope within the 1σ uncertainty band (shaded region) and hence shows no indication of a systematic bias.Possible trends in the azimuth angle ϕ, the geomagnetic angle α (the angle between shower arrival direction and the geomagnetic field, which determines the strength of the geomagnetic emission), and shower core position are also investigated and show similarly small trends compatible with zero slope.Nonetheless, a possible residual bias within these uncertainties can not be excluded.Hence, for each of these geometry parameters we evaluate the effect these possible trends would have on the shower X max values and calculate the magnitude of these possible residual biases in each of the energy bins and for each of the geometry parameters.This procedure is further described in Appendix B. These parameters are heavily correlated, so their contributions are not added in quadrature, but instead, the extrema are used.This then results in a lower and upper limit on the possible ⟨X max ⟩ systematic bias of between −6.8 and +6.6 g cm −2 , varying slightly depending on energy (see green bars in Fig. 9 and tabulated values in Table III).It should be noted that the constraints on these uncertainties are governed by the statistical uncertainty given by the number of showers we have available in this check.The possible systematic biases are well within the statistical uncertainties of ⟨X max ⟩, so there is no hint that this is a significant bias and thus it should be considered an upper limit on the possible geometry-dependent bias.
It is possible we overestimate our total systematic uncertainty when adding the possible residual bias in quadrature due to correlation with the previously determined uncertainties.Hence, we use the independent X max reconstruction of the fluorescence telescopes, that is available for 53 air showers in our data set, to obtain an additional and independent estimation on the total systematic uncertainty.The FD data has been prepared as in [40] and is shown against the radio reconstruction of X max in Fig. 12. Fig. 13 shows the distribution of the difference between the two reconstructions to be compatible with zero within −3.9 ± 11.2 g cm −2 for the 53 showers with energies predominately between 10 17.5 and 10 18 eV.Taking into account the systematic uncertainty on the FD X max reconstruction itself for these energies (roughly ±10 g cm −2 ) [40] and summing the lower and upper limit in quadrature with the FD uncertainty, results in systematic uncertainty limits of −18.1 to +12.4 g cm −2 on X max for these events, respectively.This estimate for the upper limit of the total systematic uncertainty matches closely to the values for the total systematic uncertainty on ⟨X max ⟩ of Fig. 9 (on average −15.6 and +11.2 g cm −2 , as calculated for the energies of those 53 events).The combination of these two independent estimations of systematic uncertainties provides further support that our systematic uncertainties are well-understood and that all significant effects have been accounted for.Furthermore, the compatibility of the direct eventto-event comparison of the two independent methods hints at the robustness of our understanding of EM cascades in air showers and its implementation in simulations.This is especially important in the context of X max measurements using other aspects of the shower, such as the muonic component [45], which is arguably less well-understood as suggested by the measurements of a significant muon deficit in simulations [46].Our new constraints between the radio and fluorescence X max scales might provide new hints in future studies.

V. RESOLUTION
In Fig. 14 we show the uncertainty on X max as a function of shower energy, as determined with our method, for the final bias-free selection of 594 showers (see Sec. IV).We find that the median resolution in X max shows a clear relation with shower energy, reaching a resolution of better than 15 g cm −2 in the highest energy bin.We parametrize the resolution as inspired by the energy resolution of electromagnetic calorimeters [47], but also functioning as a generic expansion in terms of energy.Here, a = 14.0 ± 6.8 g cm −2 , Resolution of the Xmax reconstruction method, δX max , as a function of energy in units of column density.Shown per energy bin are the median values of the uncertainties on Xmax (circles with uncertainties σ b from bootstrap resampling) for all showers in the bias-free sample (Table I) and a parametrized fit [Eq.( 8)] of the resolution in Xmax (solid line with 1σ-confidence bands).Also shown are the resolutions achieved by the Auger fluorescence telescopes [40].The black hatched region at low energy indicates the cut on energy applied earlier.The extent of each energy bin, including the number of showers per bin, is inset at the bottom of the figure.b = 12.7±2.5 g cm −2 , and c = 11.2±4.7 g cm −2 are fitted free parameters and ⊕ indicates the quadratic sum.The constant term c provides an indication of the resolution that might potentially be obtained for AERA with this method at the highest energies (given this parametrization).The change in resolution of X max is dominated by the uncertainty on the measured radio signals and hence becomes less accurate at lower energy.Comparing our resolution to the resolution achieved by FD, we achieve similar values at the highest energies where the FD reaches 15 g cm −2 [40].Furthermore, our method remains competitive down to lower energies where, for example at E = 10 17.8 eV the FD achieves the same resolution of 25 g cm −2 .The most recent results by the LOFAR radio array, where a similar simulation-fitting method is used to determine X max , report an average resolution of 19 g cm −2 between 10 16.8 and 10 18.3 eV [48].Despite the much denser antenna spacing of LOFAR, AERA achieves similar resolutions considering the respective energy regimes to which the two experiments are sensitive.
We note that up till now it has been common to quote a single resolution value for X max reconstruction methods for radio experiments, mainly because of a limited number of measured showers being available.Here we show the resolution in X max depends strongly on the shower The results are compared to predictions from CONEX air-shower simulations for three hadronic interaction models (lines) for proton (red) and iron (blue) mass compositions [31,[49][50][51] and compared to measurements by the FD (gray) [51].
The statistical uncertainties on the mean and width of the experimental results are plotted as error bars and the systematic uncertainties are shown with caps.
energy, driven primarily by the strength of radio signals measured in the antennas.Hence, the resolution is a function of detector sensitivity, shower energy, and thus heavily depends on the shower selection criteria.As such, any direct comparison of methods is less straightforward if obtained at sufficiently dissimilar detectors.

VI. THE Xmax MOMENTS AND THE DISTRIBUTION OF Xmax
From the X max distribution, for each of our six energy bins, we now also calculate the first two moments of the distribution, the mean ⟨X max ⟩ and the width σ(X max ).To obtain the latter we first subtract in quadrature the width caused by the method uncertainty, such that only the width caused by shower-to-shower fluctuations σ(X max ) remains.The method uncertainty cannot simply be characterized by a single value since the uncertainties on X max for our air showers do not necessarily follow a Gaussian distribution.A bootstrap resampling procedure is applied for this reason and with this we then also calculate the uncertainty on σ(X max ).This procedure is further described in Appendix C. The resulting mean and width of the true X max distribution are shown in Fig. 15, where we also compare this to the results from the FD (gray) and theoretical predictions of three different hadronic interaction models for a mass composition of just protons (red) or just iron nuclei (blue).The systematic uncertainties determined in the previous section are shown with capped markers.Table IV in Appendix D lists the values for the two moments of the distributions for the six energy bins, together with their statistical and systematic uncertainties.
With these AERA results we show good agreement between the Auger radio and fluorescence measurements of ⟨X max ⟩, both pointing towards a (mixed)-light composition of cosmic rays at around E = 10 17.5 eV.Note that the two measurements share the systematic uncertainty on the energy scale, which is constructed from calibration of the SD energy to the FD energy scale [44].Taking this contribution out, reduces both systematic uncertainty bands by about 3 g cm −2 .Secondly, the determination of the systematic uncertainties due to the reconstruction method for AERA X max data depends on the assumed composition.We have conservatively taken the most pessimistic mass composition scenario for our method.If we were to assume a (mixed) light composition, as the FD reports, then the composition-dependent AERA systematic uncertainty from the reconstruction method and the acceptance calculation, combined, would be reduced to only a small contribution (σ low stat.= 0 g cm −2 at all energies, roughly σ up stat.= 5 g cm −2 for all but the highest energy bin, and roughly σ up stat.= 7 g cm −2 for the highest bin)).This happens because the AERA systematic uncertainties originate primarily from systematic uncertainties on the reconstruction of the very deepest or very shallowest X max values, which would not significantly impact the average X max for a (mixed) light composition.In that case, the total systematic uncertainty on ⟨X max ⟩ 500 600 700 800 900 1000  would be just below ±10 g cm −2 for all energies.
The AERA results of the second moment of X max , shown on the right side of Fig 15 also shows compatibility with the FD results, but have limited resolving power because of the smaller number of showers in comparison to the FD measurements.In the highest AERA energy bin, σ(X max ) is somewhat higher than for the FD.Note that this bin contains just 33 showers, so a single extreme event in the tail of the X max distribution could result in the observed upwards fluctuation (consequently, the same effect is seen in the first moment for this energy bin).Hence, this single fluctuation is not considered a particularly significant deviation from the FD values within the shown statistical uncertainties.
In Fig. 16 we show the full distributions of the AERAreconstructed X max values for the bias-free set of 594 showers (Table I), split again into six energy bins.For comparison, we superimpose the X max distributions for the mixed-mass composition as determined by the Auger FD measurements (black) [40] which has been convolved with the acceptance, biases, and uncertainties of the AERA X max values and the uncertainty on the FD composition.In this way, a direct comparisons between the FD and AERA X max distribution can be made.
We quantify the compatibility between the FD and AERA X max distributions by calculating the probability we would draw the AERA event sample from the FD distribution, including the known acceptance, uncertainties, and biases from AERA, the uncertainties from the FD composition measurement, and the systematic uncertainties on X max between the FD and AERA.The FD X max distribution is here represented by Gumbel distributions [41,42] fitted to the FD X max distributions, accounting for detector resolution and acceptance [40].This allows us to easily sample from the distribution and add AERA measurement effects, such that we can compare quantities between FD and AERA on the same level.
For each AERA energy bin we repeatedly (N = 1000) generate FD X max distribution instances from the Gumbel parametrization of the FD composition, evaluated at the energies of the AERA events in that bin.For each instance, we vary the composition within its statistical uncertainties (using many instances of the composition fit to the FD composition [40]).Next, we draw, from each distribution, a set of X max values, with the same number of events as we have for AERA for that energy bin.When drawing values we take into account the AERA acceptance (see e.g., Fig. 8).Each drawn X max value we shift by its measurement uncertainty and its reconstruction bias.Both quantities are obtained from a parameterization of the difference between our reconstructed X max and the true MC X max for all our CORSIKA simulations (N sim = 27 × 594) as a function of shower energy and MC X max .This parametrization contains both the average and spread of the measurement uncertainties and bias, which are then used to draw shifts randomly.With this procedure, we obtain 1000 'mock data sets', drawn from the FD distribution, that include the main detection and reconstruction effects of AERA.These now represent X max data set instances that AERA would have measured assuming the FD composition.We can now start to compare these to the actual AERA X max data set to check the compatibility of the AERA and FD distribution.
To quantify the compatibility we use the Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests (see e.g., [52]), similarly to the compatibility tests of the X max distributions of the Auger FD and Telescope Array FD [53].These tests are commonly used to test if a data sample follows a specified distribution.The KS test is particularly suited to test for compatibility in the region around the peak of the distribution, while the AD test also provides sensitivity to the tails of the distribution.Together they are a good measure for the agreement for the overall shape of the distribution.We calculate the KS test statistic D as and the AD test statistic A 2 as where X i are the (ordered) data points in a sample, N is the number of samples, and F is the cumulative distribution function of the distribution being tested.
The distribution that we test our samples against we construct as the Gaussian KDE of the 1000 FD-drawn samples, such that we have a probability density function from which we can obtain the cumulative distribution function F .We then calculate D and A 2 for each of the individual FD-drawn samples.This provides the range of expected AD and KS test statistics given the sampling effects of AERA measurements.The resulting test statistics are shown in Fig. 17 (gray histograms).Next, we calculate the test statistics of the AERA data itself and evaluate where it falls within the test statistic distribution (green lines in the same figure).From this, we obtain the probability p of finding a test statistic value larger than the value for the AERA data, i.e., the chance that a sample taken from the distribution under examination is as compatible as the AERA data.The p values are shown in Fig. 17 and listed in Table II.We take p < 0.05 as the threshold to reject the null hypothesis of compatibility.Before interpreting the probabilities, we also have to account for systematic uncertainties.We calculate the effect on the KS and AD test statistics for the AERA data sample for a general shift of the X max distributions allowed within the systematic uncertainties of ⟨X max ⟩.The systematic uncertainties between AERA and FD consist of the contribution of the FD measurements (roughly ±10 g cm −2 ) [40] and the AERA contributions that were not included in the earlier modelled bias correction when generating F , namely the effects of the model for the atmosphere and the hadronic interaction model used in our CORSIKA simulations (±5.5 g cm −2 in total, see also Fig 9).The upper and lower values obtained for the test statistics and the corresponding p values are shown in Fig. 17 (green bands).
For the KS test, we find that the AERA data is compatible with the Auger FD composition (FD mix) within the uncertainties for all energy bins, but note that for the third energy bin this requires a small shift allowed within systematic uncertainties.The AD test, similarly, finds compatibility for all energy bins, but requires a shift within systematic uncertainties for the first and third energy bins.We investigate if the required shifts agree with each other, i.e., that compatibility also holds for all energy bins at the same time.For this, we calculate the KS and AD test statistics for the AERA data sample for a range of shifts of X max .For the simplest scenario of a constant shift at all energies, the best overall match for all energy bins is obtained with a general shift of ∆X best max = X AERA max − X FD max = −5.5 ± 0.7 g cm −2 which falls well within the systematic uncertainties.The uncertainty of 0.7 g cm −2 here shows the range of a constant shift where all energy bins show compatibility for both KS and AD test (for the p ≥ 0.05 threshold).The p values for this shift are listed in the central columns of Table II and show that for both tests p > 0.05, i.e., compatibility between AERA and FD X max measurements.Furthermore, this shift is in agreement with the shift of −3.9 ± 11.2 g cm −2 obtained for the event-by-event comparison of hybrid events ( Fig 13).
We note that the systematic shift between AERA and FD does not need to be a simple constant, but might depend on energy.Hence, we also calculate the p values for separate shifts per energy bin that lead to the best match between AERA and FD (i.e., highest p).The values for p and corresponding shifts (calculated for KS and AD tests separately) are shown in the column on the right in Table II.We note that the shifts show no significant trend with energy, suggesting that there is no strong dependence on energy in the systematic uncertainties between AERA and FD.Furthermore, the KS and AD tests agree on the obtained shifts, indicating that both the central part and the tails of the X max distribution, respectively, favour such a shift.The uncertainties on the shifts give a good indication that the KS and AD tests are able to constrain the cosmic-ray composition with the AERA measurements.In conclusion, the AERA X max distribution provides further support, beyond just the X max moments, for compatibility with the FD X max measurements and suggests a similar general shift between the AERA and FD distributions as for the event-by-event comparison in Fig 13 (a shift of about −4 g cm −2 ).Furthermore, the compatibility indicates that the uncertainties, biases and acceptance of the AERA measurements are well understood.

VII. CONCLUSIONS
In this work we show the results of the measurement of the distribution of depth of the shower maximum for air showers measured with the Auger Engineering Radio Array at the Pierre Auger Observatory.We have presented the method used to reconstruct the depth of the shower maximum by comparing measured radio signals to signals from dedicated sets of CORSIKA/CoREAS air-shower simulations.We show that the resolution of our method is competitive with established techniques to determine X max .We have selected a set of air showers with minimal selection bias and have quantified any remaining acceptance bias.Furthermore, a detailed study of systematic uncertainties has been conducted accounting for the effects of the reconstruction method, the use of simulation codes, atmospheric models, the energy scale, and possible geometry-dependent residual bias.The total estimated systematic uncertainties on the mean of the X max distribution has been shown to be compatible with an event-by-event comparison of 53 showers measured by both the Auger fluorescence and radio detectors, indicating a good understanding of the systematic uncertainties in the AERA measurements.In addition, this direct comparison sets a limit on the systematic shift between the AERA and FD X max scale of −3.9 ± 11.2 g cm −2 , providing new constraints on our understanding of shower physics.
The calculated moments of the X max distribution show compatibility with the composition as previously measured by the FD.In addition, the compatibility of the overall shape of the X max distributions between AERA and the FD provides further support beyond the two central moments for the mixed-light composition as previously measured by the FD.Discussions on the comparison of the X max moments to other experiments is available in an accompanying publication [21].
To calculate the effect of possible residual bias depending on geometry parameters such as shower zenith angle or core position, we investigate potential trends with X max .The mean X max changes with energy, so we defined Y max as in Eq. ( 7) such that this expected dependence can be removed and any remaining residual biases can be identified for the data set as a whole.
We investigate the dependence of Y max as a function of shower zenith angle θ, geomagnetic angle α, and core position in x and y coordinates (relative to the center of the Auger array).The azimuth angle ϕ has been checked for a possible sinusoidal trend, but shows no variation that is not already explained by zenith or geomagnetic angle dependencies, hence is not shown.For each of these geometry parameters, event selection might be affected by the irregular AERA antenna spacing and various antenna hardware types used in AERA, despite the acceptance cuts that were implemented (see Sec. IV).We split up the Y max data set in regular bins for each of these variables and calculate the mean of Y max in each of these bins.The uncertainty on Y max combines the uncertainties of X max and the energy dependence that was compensated for.The uncertainty on these mean values is determined by bootstrap resampling where we repeatedly sample 75% of the data and look at the variations in the mean.This procedure is done such that both the non-Gaussian distribution of Y max , the uncertainties on Y max , and statistical uncertainties from the limited number of showers can all be accounted for properly.regions indicate where no data is present, either from prior cuts or physical constraints.
None of the trends show a significant deviation from the all-data mean values within their uncertainties and as such we can say that there are no significant residual biases that depend on these geometry parameters.In other words, any possible bias in Y max would need to be independent of geometry, which would be difficult to envision.This is not to say that there is no residual bias; within the fit uncertainties there is the possibility for bias to exist.Hence, next we calculate the expected possible residual bias within the statistical uncertainties of our data set.We take as reference point the values of the geometry parameters G ⊂ (cos θ, α, x, y) where the least bias is expected (i.e., where detector sensitivity is optimal).This will provide us with an estimate for possible bias in possibly less sensitive regimes.Using the linear fits Y (G) we calculate the shift of each shower Y max value under the assumption that the Y max value at the expected G value G exp is the true bias-free value.This expected value for zenith angle is θ exp = 55 • , the highest zenith angle allowed in our data set and also where the footprints are largest and antenna sensitivity is excellent for the AERA antenna types.Hence, the acceptance would arguably be least affected there and any bias in less sensitive regimes we can determine w.r.t.this value.For the geomagnetic angle we choose α exp = 180 • , where the geomagnetic radio emission is at a maximum.For the core position we pick x exp = −26.1 km and y exp = 15.1 km, corresponding approximately to the center of the part of AERA with the densest antenna spacing (see Fig. 3).
This shift in Y max is then given for each individual shower by: The bias for each geometry parameter G is then given by the difference of the mean Y max of the showers in a certain energy bin, with and without the correction to the expected bias-free G: It should be noted that the biases from the geometry parameters are highly correlated quantities and as such cannot be summed in quadrature to get a total uncertainty.Hence, the extrema of the set of possible biases {Bias G } is used to determine an upper and lower limit on the possible residual bias: This then provides an estimation for the possible residual bias in Y max allowed within the statistical uncertainties of the shower data set, for each of the energy bins.From the definition of Y max [Eq.(7)] it follows that these ⟨Y max ⟩ bias values apply also to ⟨X max ⟩.The result of the Bias up (E) and Bias low (E) on ⟨X max ⟩ are shown in Fig. 9.The the possible bias does not seem to be dominated by any single effect and does not change significantly with energy.It indicates that within the statistical uncertainties with which we can constrain any bias we are potentially biasing our X max values between ±7 g cm −2 .Table III lists the contributions of the geometry parameters per energy bin.All values are well within the statistical uncertainty on ⟨X max ⟩ itself and hence there is no hint of any significant residual bias in this analysis.
Cross-checks have been done to test that additional artificially introduced biases (e.g., adding a linear X max (α) dependence) can be recovered with this procedure to a degree that the Bias up (E) and Bias low (E) indeed account for the artificial bias.Additionally, also the effect on the median Y max versus the geometry parameters has been evaluated.Compared to using the mean, the median is less sensitive to the shape of the tail of a distribution (i.e., less sensitive to large outliers) and thus has an increased sensitivity to a more general shift of the distribution.Also here no significant trend was found and the allowed possible biases were similarly small.The true distribution of X max can be estimated from the width of the AERA X max distribution by subtracting the effect of the method resolution.Since the uncertainty of the method is not a perfect Gaussian distribution one can't simply calculate this with σ true = σ 2 measured − σ 2 method , (C1) because there is no simple single value for σ method for a distribution of arbitrary shape.To account for this, a bootstrapping procedure is applied, where the uncertainty of the method is repeatedly randomly sampled and subtracted in quadrature from the total measured width of the AERA X max distribution.Assuming that the spread in the AERA X max distribution due to the method resolution and the true spread are uncorrelated, the distribution of the average true spread (i.e., the intrinsic spread in X max due to shower-to-shower fluctuations) is estimated by σ(X max ) ≡ B N Var (φ 75 (X max )) − φ 75|1 (δ Xmax ) 2 .
(C2) Note that this equation still closely mirrors Eq.C1, but with some extra steps.B N (x) we define as the distribution given by performing N bootstrapping iterations on the argument x, φ 75 (y) is defined as the function that samples 75% of a data series y at random for the bootstrapping of B N , and φ 75|1 (y) the function that selects one value at random from φ 75 (y) and returns δ y , the uncertainty on y.The first term in the square root then represents the width of the AERA X max distribution of the showers and the second term is the X max uncertainty estimation of the method.The number of iterations N for the bootstrapping is set at 10000 to sample the whole distribution sufficiently.
The mean and width of the σ(X max ) distribution B N can now be calculated, but also this distribution is not necessarily a Gaussian distribution.We take the mean and the quantile region equivalent to the probability contained in a 1σ standard deviation of a Gaussian (i.e., the region between the 15.87% and 84.13% quantiles) will be quoted as the mean and (asymmetric) uncertainty on σ(X max ), respectively.

:
INFN, Sezione di Lecce, Lecce, Italy : INFN, Sezione di Milano, Milano, Italy : INFN, Sezione di Napoli, Napoli, Italy : INFN, Sezione di Roma "Tor Vergata", Roma, Italy : INFN, Sezione di Torino, Torino, Italy : Istituto di Astrofisica Spaziale e Fisica Cosmica di Palermo (INAF), Palermo, Italy : Osservatorio Astrofisico di Torino (INAF), Torino, Italy : Politecnico di Milano, Dipartimento di Scienze e Tecnologie Aerospaziali , Milano, Italy : Università del Salento, Dipartimento di Matematica e Fisica "E.De Giorgi", Lecce, Italy : Università dell'Aquila, Dipartimento di Scienze Fisiche e Chimiche, L'Aquila, Italy FIG. 1.Example of a footprint of the radio emission on the ground for a simulated air shower with an energy of 8.2 • 10 17 eV, a zenith angle of 50.2 • , and a depth of the shower maximum of 749 g cm −2 .The strength of the emission is evaluated at simulated antenna positions (markers) and interpolated in between for visibility (background).The footprint has been projected into the shower plane, i.e., tilted into the plane perpendicular to the shower axis ⃗ v and rotated to project the magnetic field ⃗ B along the x-axis.

FIG. 5 .
FIG. 5.Parabola fit (dashed black line) to the reduced chi-squared values between a measured shower and each of the simulated showers for this event (blue and red markers) [see Eq. (2)] as a function of the true MC Xmax values for each simulation.The minimum of the parabola (green line) is an estimator for the Xmax value of the measured shower.This measured shower has an energy of 8.2 • 10 17 eV, a zenith angle of 50.2 • , and reconstructed Xmax = 763 ± 19 g cm −2 .It has been chosen as a representative shower falling in the middle of the AERA energy range (Fig.4, left), being close to the most common zenith angle (Fig.4, right), and having a typical Xmax resolution (Fig.14).

FIG. 6 .
FIG. 6.The parabola Xmax values reconstructed for the set of simulations of a single measured air shower (same as Fig 5), as a function of the deviation to the true MC Xmax values (dots).A kernel density estimation (background color) is made to estimate the probability density function of the difference at each X parabola max FIG. 7.Second-order bias correction ∆X KDE max,2 and total uncertainty δX KDE max,2 after including the effects of a free core and free energy scaling in the minimization procedure.Once again the same event is used as in Fig.5and Fig.6.

4 FIG. 8 .
FIG.8.Calculated acceptance for measured AERA showers in the energy bin from 10 17.95 to 10 18.10 eV (thick green line) and the systematic effect it has on the mean and width of the Gumbel Xmax distributions (annotated values in g cm −2 ) for a pure proton mass composition, pure iron mass composition, and the mixed-mass composition as measured by Auger FD[40] (solid red, blue, and black lines).The dashed lines are the distributions convolved with the acceptance.The lines are plotted at 50% opacity since they match closely.Vertical lines show the respective means.

FIG. 9 .FIG. 10 .
FIG. 9. (Left): Overview of upper and lower values of the systematic uncertainties on the mean of the Xmax distribution (⟨Xmax⟩).The individual contributions to the total uncertainty are plotted as bars centered in each of the energy bins.The total uncertainty (black lines) is the quadratic sum of the individual contributions.The average energy in each energy bin is shown as black circles.(Right): Overview of systematic uncertainties on the true spread of Xmax (σ(Xmax)).

FIG. 11 .
FIG. 11.Relation between Ymax [Eq.(7)] and the cosine of the zenith angle.The mean of Ymax is shown in equally-spaced bins, or merged bins if containing less than 40 showers (black squares).The number of events per bin is quoted next to each bin.The solid-line error bars show the uncertainties on the means, determined with bootstrap resampling.The dashedline bars indicate the extent of each bin.Also shown are the mean of the entire data set (dashed line with 1σ-confidence band) and a linear fit to the mean values (solid line with 1σconfidence band).

2 X AERA max + δ 2 X
FIG. 13. Results of the comparison of Xmax for 53 showers measured with both FD and AERA.Plotted is the weighted KDE (black dashed curve) of the event-to-event differences in Xmax (sum of 53 Gaussian distributions with the individual differences as means and combined AERA and FD uncertainty δ = (δ 2 X AERA max + δ 2 X FD max ) 0.5 as widths).The black markers show the spread on the KDE, evaluated at intervals of 40 g cm −2 , obtained by repeatedly taking N = 53 samples from the KDE.The calculated weighted mean µ b and width σ b of the differences are annotated in the figure (uncertainties are calculated by bootstrap resampling where we repeatedly sample 75% of events).For comparison, the Gaussian distribution corresponding to µ b and σ b is plotted as solid green curve.Note that the combined resolution of FD and AERA (53.3 ± 5.7 g cm −2 , as calculated from the Xmax uncertainties of the 53 events) can account for the spread of the difference.
FIG. 14.Resolution of the Xmax reconstruction method, δX max , as a function of energy in units of column density.Shown per energy bin are the median values of the uncertainties on Xmax (circles with uncertainties σ b from bootstrap resampling) for all showers in the bias-free sample (TableI) and a parametrized fit [Eq.(8)] of the resolution in Xmax (solid line with 1σ-confidence bands).Also shown are the resolutions achieved by the Auger fluorescence telescopes[40].The black hatched region at low energy indicates the cut on energy applied earlier.The extent of each energy bin, including the number of showers per bin, is inset at the bottom of the figure.
FIG.16.Distribution of Xmax measured with AERA (green) for six energy bins.These distributions still include the effects of detector resolution, bias, and acceptance.The distributions are compared to the mixed-mass composition as measured by Auger FD[40] (black), which has been convolved with the detector effects of AERA to allow for direct comparison to the AERA distribution (see main text).Energy ranges and number of showers are annotated in the figures.

Fig. 18 showsFIG. 18 .
FIG. 18. Relation between Ymax [Eq.(7)] and the cosine of the zenith angle (top left), geomagnetic angle (top right), core position x coordinate (bottom left), and core position y coordinate (bottom right).Both core coordinates are relative to the center of the Auger array.The mean of Ymax is shown in equally-spaced bins, or merged bins if containing less than 40 showers (black squares).The number of events per bin is quoted next to each bin.The solid-line error bars show the uncertainties on the means, determined with bootstrap resampling.The dashed-line bars indicate the extent of each bin.Also shown are the mean of the entire data set (dashed line with 1σ-confidence band) and a linear fit to the mean values (solid line with 1σ-confidence band).
Appendix C: Calculation of the Second Moment of the Xmax Distribution Appendix D: Tabulated Xmax Moments FIG. 17. Compatibility test of the AERA data and the composition as measured by the Auger FD. Results are shown for six energy bins (values shown in each panel).The top six panels show the distribution of the Kolmogorov-Smirnov (KS) test statistic (gray histograms) for 1000 Xmax samples generated from the FD composition as would be measured by AERA (i.e., including the effects of the FD composition uncertainties and AERA reconstruction bias, resolution, and acceptance).The green line in each panel shows the KS test statistic for the AERA data sample (with systematic uncertainty band).The probabilities for compatibility are quoted in the legend and listed in TableII.The bottom six panels show the same procedure for the Anderson-Darling (AD) test.

TABLE II .
Probabilities for the AERA Xmax distribution to be drawn from the FD composition, evaluated with the Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests, as described in Sec.VI).Probabilities are quoted per energy bin, listing the values for three scenarios: no shift ∆Xmax between the AERA and FD Xmax distributions (left), the best-matching overall constant shift of ∆X best max = X AERA max −X FD max = −5.5 g cm −2 (center), and best-matching shift for each energy bin and test statistic separately (right).The corresponding shifts are also listed including the range where the test statistic results in p ≥ 0.05.
Table IV lists the values of the two central moments of the X max distribution and their uncertainties for six energy bins.