Density of GeV muons in air showers measured with IceTop

We present a measurement of the density of GeV muons in near-vertical air showers using three years of data recorded by the IceTop array at the South Pole. Depending on the shower size, the muon densities have been measured at lateral distances between 200 m and 1000 m. From these lateral distributions, we derive the muon densities as functions of energy at reference distances of 600 m and 800 m for primary energies between 2.5 PeV and 40 PeV and between 9 PeV and 120 PeV, respectively. The muon densities are determined using, as a baseline, the hadronic interaction model Sibyll 2.1 together with various composition models. The measurements are consistent with the predicted muon densities within these baseline interaction and composition models. The measured muon densities have also been compared to simulations using the post-LHC models EPOS-LHC and QGSJet-II.04. The result of this comparison is that the post-LHC models together with any given composition model yield higher muon densities than observed. This is in contrast to the observations above 1 EeV where all model simulations yield for any mass composition lower muon densities than the measured ones. The post-LHC models in general feature higher muon densities so that the agreement with experimental data at the highest energies is improved but the muon densities are not correct in the energy range between 2.5 PeV and about 100 PeV.

We present a measurement of the density of GeV muons in near-vertical air showers using three years of data recorded by the IceTop array at the South Pole. Depending on the shower size, the muon densities have been measured at lateral distances between 200 m and 1000 m. From these lateral distributions, we derive the muon densities as functions of energy at reference distances of 600 m and 800 m for primary energies between 2.5 PeV and 40 PeV and between 9 PeV and 120 PeV, respectively. The muon densities are determined using, as a baseline, the hadronic interaction model Sibyll 2.1 together with various composition models. The measurements are consistent with the predicted muon densities within these baseline interaction and composition models. The measured muon densities have also been compared to simulations using the post-LHC models EPOS-LHC and QGSJet-II.04. The result of this comparison is that the post-LHC models together with any given composition model yield higher muon densities than observed. This is in contrast to the observations above 1 EeV where all model simulations yield for any mass composition lower muon densities than the measured ones. The post-LHC models in general feature higher muon densities so that the agreement with experimental data at the highest energies is improved but the muon densities are not correct in the energy range between 2.5 PeV and about 100 PeV.

I. INTRODUCTION
Cosmic rays with energies above 1 PeV enter the Earth's atmosphere where they produce extensive air showers that can be measured with large detectors at the ground. The properties of the initial cosmic ray, such as energy and mass, are inferred indirectly from the particles measured at the ground and their interpretation strongly relies on Monte Carlo simulations of the shower development and thus on theoretical models. Although the energy spectrum of cosmic rays has been measured with high precision over many orders of magnitude, the sources of cosmic rays are still unknown, their acceleration mechanisms and mass composition are uncertain, and several features observed in the energy spectrum are not well understood [1]. One of the main challenges in understanding cosmic-ray induced extended air showers is the accurate description of hadronic interactions over several decades in center-of-mass energy, from a few tens of GeV to more than 10 9 GeV. The relevant interactions are in the forward fragmentation region, with most of the energy beamed into pseudo-rapidity ranges that are difficult to study in colliders since they lie very close to the incident beam [2,3]. Their cross-sections cannot be computed from quantum chromodynamics, because the * also at Università di Padova, I-35131 Padova, Italy † now at Dept. of Physics, TU Dortmund University, D-44221 Dortmund, Germany ‡ also at Earthquake Research Institute, University of Tokyo, Bunkyo, Tokyo 113-0032, Japan § Contact: analysis@icecube.wisc.edu strong coupling for these interactions is too large for a perturbative solution and the interactions are too complex for current lattice calculations. Instead, they are calculated using phenomenological models tuned to a variety of data sets from collider and fixed-target experiments, and are extrapolated to the ranges of phase space relevant for air showers.
Several hadronic interaction models are available. The most recent versions are colloquially called post-LHC, since they take high-energy data from the LHC into account. These are Sibyll 2.3 [4,5], EPOS-LHC [6], and QGSJet-II.04 [7]. Sibyll 2.1 [8] is a pre-LHC model that is still popular for simulations of air showers. Air shower experiments can help improve the hadronic interaction models by testing them. Tests can be performed by making various complementary measurements of the muons in air showers, covering different ranges in primary energy, primary arrival direction, muon energy, muon lateral separation, and stage of the shower development.
Correctly accounting for muon production in air showers is of vital importance to the study of cosmic rays. The muon content of an air shower, together with a measure of its electromagnetic component, can be used to simultaneously estimate the energy and mass of the primary cosmic ray. Since cosmic ray arrival directions do not point back to their sources, the cosmic-ray spectrum and mass composition are key observables for testing astrophysical models. If the energy spectrum were accurately determined for different primary cosmic ray species, several competing astrophysical models for the origin and propagation of cosmic rays could be disfavored or excluded [1]. However, our incomplete knowledge of highenergy hadronic interactions causes differences between simulated and measured air showers. As the analysis of indirect measurements relies on simulations, the uncertainty in hadronic interactions weakens otherwise powerful techniques for inferring the mass of cosmic rays.
A cosmic-ray air shower consists of a chain of hadronic interactions whose main characteristics can be understood using the simple Heitler-Matthews model [9,10]. In this simple picture, if the primary particle is a proton the number of muons scales sub-linearly with the energy: An air shower initiated by a nucleus with A nucleons is approximated by a superposition of proton showers, each with energy E/A. In this case the number of muons is In this model, an iron primary with 56 nucleons produces about 50 % more muons than a proton shower with the same energy. The proportionality factor in Eq. 1 depends on a complex interplay of many factors, and varies greatly depending on the hadronic interaction model used to simulate the interactions [11]. Among the relevant factors are the amount of energy channeled into the electromagnetic cascade at each stage, the inelasticity of the interactions, their multiplicity and the energy/angular distributions of the secondary mesons. The earliest measurements of the muon component of air showers date to the 1960s [12][13][14]. At the time, Greisen proposed a parameterization for the lateral distribution of muons in air showers with energies in the 10 6 -10 8 GeV range to describe the measurements performed by the MIT air shower program. The number of muons in a wide range of primary energies was subsequently measured with the Akeno extensive air-shower experiment [15,16]. Recently, the KASCADE-Grande collaboration published the energy spectra of elemental groups of cosmic rays [17] and the lateral distributions of muons (E µ > 800 MeV) [18] at radial distances of 100 m to 610 m in the primary energy range between 10 PeV and 200 PeV for zenith angles θ < 18°. Neither of these measurements show a discrepancy between the abundance of muons in real showers compared to simulated showers using QGSJet II as hadronic model. However, surface detector arrays like Akeno and KASCADE-Grande rely on simulations for inferring both the shower energy and the muon number. An over-or under-abundance of muons in simulated air showers affects both variables, and therefore surface arrays have not been able to test for a discrepancy in the number of muons at a given energy.
Using air fluorescence techniques [19,20], it is possible to simultaneously measure the fluorescence profile of air showers, which is used to determine the energy and mass of the primary, and the signals produced in an array of detectors on the ground, including the contributions from muons. Using these techniques, a discrepancy between the simulated and measured number of muons in air showers between 100 PeV and 1 EeV was first reported by the HiRes-MIA collaboration [21]. The density of muons (E µ > 850 MeV) at 600 m from the shower axis was found to be larger than the density of muons in a sample of iron primaries simulated using the QGSJet and Sibyll hadronic models [22,23], while the fluorescence measurements showed a composition much lighter than pure iron. The Pierre Auger Observatory measured the muon abundance (E µ > 300 MeV) in air showers above 4 EeV in two complementary analyses using inclined (θ > 65°) and vertical (θ < 65°) air showers, respectively, which probe the muon flux at a different atmospheric mass overburden. Both measurements reported a deficit of muons in simulations for all hadronic interaction models [24,25]. More recently, this deficit has also been confirmed in vertical air showers between 200 PeV and 2 EeV (θ < 45 • ) by an analysis using dedicated underground muon detectors at the Pierre Auger Observatory [26]. The measurement of the cosmic ray flux around 1 EeV and higher, based on muon densities several meters from the shower core (E µ > 2 GeV), by the NEVOD-DECOR experiment can also be interpreted as evidence for a muon deficit in simulations [27,28].
Other results are mixed. The measurement of muons (E µ > 1 GeV) by the Yakutsk array at primary energies of above 20 EeV and zenith angles up to 45°appears to support the hypothesis of a deficit of muons in simulations [29]. However, measurements of muons (E µ > 10 GeV) by the EAS-MSU surface detector at radial distances up to 200 m from air showers with energies between 100 and 500 PeV (θ < 30°) do not appear to yield a discrepancy [30]. Comparisons of measurements of the muon flux at energies between 1 TeV and 10 TeV with simulations show a deficit or excess, depending on the model [31,32]. To further complicate the matter, the KASCADE-Grande collaboration has measured the attenuation of the muon number (E µ > 230 MeV) in air showers between 10 PeV and 1 EeV as a function of the zenith angle and thus in dependence of the mass overburden [33]. This study found a larger attenuation length of muons when compared to several hadronic interaction models which indicates that the simulated muon energy spectra are steeper than in experimental data.
The IceCube Neutrino Observatory [34] is uniquely suited to probe the muon content of air showers. The deep portion of the detector has been used to study the spectrum of atmospheric muons [35] and the lateral distribution of high-energy muons in air showers [36,37], because high-energy muons are able to penetrate the ice shield above the detector. Low-energy muons are detected with IceTop, the surface component of Ice-Cube [38]. IceTop has been used to measure the energy spectrum of cosmic rays in the energy range from 250 TeV up to about 1 EeV [39][40][41]. The high altitude of IceTop, at 2.8 km above sea level and a depth of around 690 g/cm 2 , places it close to the shower maximum. The proximity to the shower maximum allows for an energy resolution better than 0.1 in log 10 (E). Hybrid measure-ments, which used both detector components, have been used to infer the mass composition of cosmic rays [41,42].
This article reports on the density of muons measured with IceTop above about 100 MeV, mostly in the several GeV region, in the following referred to as GeV muons. IceTop does not have dedicated muon counters, but can identify muon hits far away from the shower axis by exploiting the difference in detector response of water-Cherenkov detectors to muons and to electrons, positrons and gammas. Muons form a distinctive peak in the Ice-Top signal distributions, which can be used to determine the density of muons in air showers. We first estimate the muon density without relying on Monte Carlo simulations of air showers. A correction of less than 10% is then applied which is estimated using Monte Carlo simulations. This correction takes into account the finite resolution of the experiment, attenuation in the snow, and detection and selection efficiencies. By comparing our result with the expectations according to simulations, we constrain the current hadronic interaction models.
The experimental setup is described in Section II, the event selection in Section III, and the analysis in Section IV. The statistical model for signals produced by muons is described in Section IV B. Two kinds of backgrounds are considered: signals without muons are described in Section IV C and accidental coincidences in Section IV E. The effect of a growing snow cover on top of the IceTop detectors is discussed in Section IV D. The correction factors obtained from simulations of the detector response are presented in Section IV F. The final result is presented in Section V.

II. ICETOP
IceTop is an air shower array of 81 stations deployed in a triangular grid with a typical separation of 125 m [38]. It was completed in December 2010. The detector covers an area of approximately one square kilometer and is located above the IceCube detector at the geographic South Pole. Each station consists of two Cherenkov detectors separated by ten meters. The Cherenkov detectors are cylindrical tanks with an inner radius of 0.91 m. The detectors contain two Digital Optical Modules (DOMs) and are filled with clear ice up to 0.9 m depth. The 40 cm between the ice surface and the lid are filled with expanded perlite (amorphous volcanic glass, expanded to low density with grain sizes of the order of 1 mm) for thermal insulation and light protection. Each DOM combines a 10 inch photo-multiplier tube (PMT) with electronics for signal processing and readout [43,44].
A discriminator trigger occurs when the voltage in one of the DOMs in a tank exceeds a predefined threshold and the capture of the PMT waveform and digitization are launched. Stations have two readout modes. A Hard Local Coincidence (HLC) occurs when both tanks in a station have a discriminator trigger within a time window of 1 µs. If there is a discriminator trigger in only one tank, it is called a Soft Local Coincidence (SLC). The anode pulse is sampled in time bins, which are then digitized and baseline subtracted. In SLC mode, each launched DOM is read out but only coarse information, i.e. time stamp and total charge of a DOM, is transmitted. In HLC mode, in addition to the SLC data, also the full waveform is transmitted. For HLCs, the total charge of a DOM is also calculated from the waveform offline, after a better estimate of the baseline becomes available. In the case of SLCs, this is done by the DOM firmware, using the best estimate of the baseline at the time. For the current analysis the total charge together with a time stamp constitutes the tank's signal, S. The calibrated charge is expressed in units of Vertical Equivalent Muon (VEM), which is the average charge produced by a vertically through-going muon.
The shower direction, the intersection point of the shower axis with IceTop (the shower core), and the shower size are estimated by fitting the measured signals with a Lateral Distribution Function (LDF) and the times of the signals with a phenomenological model of the shower front [38,45]. The LDF and the shower front model are functions of the lateral distance, the distance of closest approach between the shower axis and the tanks. The LDF includes an attenuation factor due to the snow cover on top of each tank. To estimate the energy of the primary cosmic ray, we use the relationship between the shower size S 125 , defined as the signal at a lateral distance of 125 m, and the true primary energy, as obtained from simulations [40]. This relation was derived using air shower simulations assuming a specific composition model commonly referred to as H3a. This is a 4-component model based on the 5-component model by Gaisser [46], with the nitrogen and aluminum components merged into a single component with oxygen as the representative element. For cos θ > 0.95, where θ is the zenith angle of the arrival direction, the relation is: log 10 (E) = 0.938 log 10 (S 125 ) + 6.018.
(3) To improve the general quality of reconstructions and to stay within the simulated zenith range, the following cuts were applied to the simulated and the experimental data. These cuts are the same as in previous IceTop analyses [40,41]: • Events must trigger at least five stations and the reconstruction fit must succeed.
• Reconstructed cores must be within the geometric boundary of the array.
• The station with the largest signal must not be at the edge of the detector.
• There must be at least one station with signal greater than 6 VEM.
After cuts, the energy resolution, defined as the standard deviation of the distribution of log 10 (E reco /E true ), is better than 0.1 at 1 PeV and around 0.05 at 100 PeV. For further analysis, only events with reconstructed energy E reco ≥ 2.5 PeV are considered, an energy above which IceTop reaches a detection efficiency close to 100% for all cosmic ray masses, from hydrogen up to iron [41]. This analysis is further restricted to events with zenith angles θ < 18°, in contrast with previous IceTop analyses [40,41,45]. This is because vertical muons form a distinctive peak around 1 VEM in the signal distribution, a feature that is used in this analysis to determine the muon density in air showers.
Another difference with previous IceTop analyses is the addition of SLCs. In previous publications, only HLC signals were used. The SLC signals are not used to reconstruct the energy and arrival direction of the air showers, but they are essential for the measurement of the muon density, since they occur at large lateral distances, where the muon component dominates over the electromagnetic component and there is a high probability that only one tank of a station is hit. For HLC signals used in the muon determination, only the information also available for SLC signals is used (i.e. total charge and time stamp).
While precipitation at the South Pole only amounts to about 2 cm per year, there is significant snow accumulation due to wind-driven drift. The snow height on top of the detectors increases on average by 20 cm every year. The snow cover over each tank is measured twice a year and is accounted for during the air shower reconstruction process (see Section IV D).

B. Simulated datasets
The simulation process starts with the production of air showers using Corsika [47]. The resulting air showers are used as input to a detailed simulation of the detector response. As in previous IceTop analyses [40,41], at the input to the detector response, each Corsika shower is resampled 100 times to increase statistics. Shower cores are uniformly distributed over areas larger than the detector area, with an energy-dependent resampling radius. Resampling radii are chosen at the largest distance possible for the shower to trigger the array. The detector response is simulated using custom software which simulates the entire hardware and data acquisition chain. The interactions of particles with the IceTop tanks are simulated using the Geant4 package [48], which handles the treatment of particles starting well above the detector, including their interactions in the snow. The simulated shower energies are distributed according to an E −1 differential spectrum between 0.1 and 100 PeV, and their zenith angles are distributed uniformly in sin 2 θ between 0 and 65°. This angular distribution accounts for the projection of the detector area on a plane perpendicular to the air shower direction. Showers induced by four primary types (H, He, O, Fe) are simulated.
This particular analysis is sensitive to accidental coincidences. These are signals uncorrelated with the event, but coincident in time. They are produced by lowerenergy showers. This was not an issue in previous analyses because they used only HLC signals, and HLC signals are rarely produced by lower-energy showers. For this reason, we enhanced the standard simulation chain to include a background consisting of accidental coincidences as well as a potential fluctuations in the VEM calibration values over time. These are described in Section III B.
Several simulated datasets are combined in this study in order to reproduce the experimental conditions over multiple years. The accumulation of snow on the Ice-Top stations increases the energy threshold of the trigger. To account for this effect, we simulate the detector in three epochs using different values of snow coverage in each epoch. For convenience, the epochs are chosen to correspond to each campaign year. The amount of snow specified for each epoch corresponds to in-situ measurements made roughly half way through each campaign, in October 2010, November 2011 and October 2012 respectively. The simulations use moderately different software versions and two models for the South Pole atmosphere for an atmospheric mass overburden of 692.9 g/cm 2 (680 hPa).
The IC79.2010 epoch simulated datasets, each with 60,000 showers per primary, are the same ones used in a previous publication [40]. They were produced with Corsika v69900, using atmosphere 12, which is based on the July 1, 1997 South Pole atmosphere. The datasets in IC86.2011 and IC86.2012 epochs, each with 20,000 showers per primary, were produced with Corsika v73700 using an atmospheric profile for April, obtained from data of the atmospheric conditions between 2007 and 2011 [49]. In all cases, the hadronic model is Sibyll 2.1 [8] for interactions with energies greater than 80 GeV and Fluka 2011.2c [50,51] below 80 GeV. The differences of the atmospheric models and software versions used for different epochs are negligible in this analysis.
The true muon densities at ground are determined directly from the Corsika simulations, without simulating the detector. These densities are compared to the measured muon densities in order to estimate and correct various effects.
The main dataset based on Sibyll 2.1 is used to develop the general analysis pipeline. However, in order to study and compare predictions from different hadronic models, smaller datasets were produced with Corsika v73700 according to the IC86.2012 epoch, using the post-LHC hadronic interaction models EPOS-LHC [6] and QGSJet-II.04 [7]. These datasets are used to study systematic shifts in this analysis due to different hadronic models and to compare the final results to their predictions. Simulated datasets based on the Sibyll 2.3 hadronic model [4,5] were not available at the time of this analysis and will not be considered in this work.

Simulating accidental coincidence background
Accidental coincidences are simulated at the end of the standard simulation chain, at which point the signals in each DOM have already been calculated in the absence of low-energy background. Accidental coincidence signals are then added to the event at a fixed rate of 1470 Hz. The accidental rate of 1470 Hz and its signal probability distribution function is determined from experimental data using an off-time window as described in Section IV E. If an accidental signal is generated within 427 ns of an existing signal, the two are replaced by a single signal equal to their sum. The analysis of measured events shows that the smallest possible time interval between two successive signals from the same DOM is about 1.65 µs. We implement this dead time in the simulation by sweeping over the time-ordered signals of each DOM, after the background has been added. If two signals are found within an interval of 1.65 µs, the second signal is discarded.
The last step in the simulation of accidental background is to recalculate the local coincidences. A signal could be added to a tank in a station where the neighboring tank already had a signal within the HLC coincidence window of 1 µs. In this case, the background signal and the existing SLC are reclassified as HLC. Conversely, due to the dead time simulation, it can happen that one HLC signal is discarded after an accidental background signal is added, causing the remaining HLC signal, in the neighboring tank, to be reclassified as SLC. Because of this, we sweep over all tanks after the background simulation and re-assign HLC and SLC trigger bits according to the HLC criterion of having a station neighbor within 1 µs.

Simulating VEM fluctuations between calibration runs
Calibration of tank signals is performed in regular biweekly intervals, and the position of the peak representing one VEM can fluctuate within 3% over time, smearing the features in the signal histogram. The SLC signals are not calibrated with the same accuracy as the HLC signals, because SLCs are only calibrated online by the DOM firmware, while HLCs are calibrated offline using more sophisticated algorithms. This results in different smearing for SLC and HLC signal histograms. To approximately include both effects in the simulations, we apply a Gaussian smearing to simulated signals. The smearing width is 0.05 VEM for HLC and 0.1 VEM for SLC. If the resulting smeared signal is negative, the random sampling is repeated until a positive signal is produced. The sampling widths are chosen to match the width of signal distributions in data.

A. Method
The basis of this analysis is the different response of the detector to electrons and muons. Muons produce a characteristic signal that can be identified at large lateral distances, as shown in Figs tanks hit by more than one muon. The main step in the analysis, which will be described in greater detail below, is to estimate the number of muons by measuring this muon-induced population. The two populations are clearly seen in Fig. 2, which histograms the signals at selected fixed lateral distances. The number of muons is determined using a loglikelihood method to fit the signal distributions of all events at fixed energy, zenith, and lateral distance, using a multi-component model as illustrated in Fig. 2 tributions is described in the following subsections. The model has a total of eight parameters: six in the muon response model described in Section IV B, and two in the electromagnetic (EM) model presented in Section IV C. This empirically-driven fit determines the muon signal parameters from the data independent of air shower simulations. It finds the mean number of muons per event per radial and energy bin, N µ , which is then divided by the cross-sectional area of all IceTop tanks within that bin projected onto a plane perpendicular to the zenith angle direction to yield the muon density. This assumes that the direction of motion of the muons coincides with the direction of the reconstructed air shower. The resulting raw reconstructed muon densities,ρ µ , are shown in Fig. 3. These distributions are fit to interpolate the raw muon densities at radial distances of 600 m and 800 m. With simulated datasets, a small deviation from the true muon number is observed. This deviation is accounted for by multiplying a correction factor to the raw densities to determine the final muon densities, ρ µ , at lateral distances of 600 m and 800 m, as described in Section IV F.

B. Detector response to muons
When a single particle enters the tank, the number of photoelectrons at the PMT is proportional to the particle's track length. The track length of a muon crossing the tank is determined entirely by geometry. For a uniform beam of muons entering a tank, the statistical distribution, g(l), of track lengths, l, can be calculated analytically [52]. Example distributions, for two arrival directions, are shown in Fig. 4a. In the case of vertical muons, the distribution is a Dirac delta function, because all muons traverse the tank from top to bottom. By definition, the signal recorded in this case is 1 VEM. For muons arriving at an angle θ with respect to the vertical, the distribution is a sum of two terms: a Dirac delta function centered at l(θ = 0) sec(θ) produced by all muons that enter the tank through the top and exit through the bottom, and a continuous distribution that corresponds to muons that do not traverse the tank from top to bottom. The latter are called corner clipping muons. Given the track length of a muon, the signal probability distribution is given by an exponentially modified Gaussian kernel function, K(S|l), such as the one shown in Fig. 4b [53]. The kernel function is specified by three parameters: a width σ, an exponent λ and its mean, µ. This function takes care of effects such as the photon sampling and PMT collection efficiency. The signal distribution for a given number of muons per tank, N µ tank , is found by adding the contributions of an integer number of muons, weighted by the Poisson probability. The signal distribution for an integer number of muons is given by the signal distribution of a single particle convolved multiple times with itself. The resulting distribution is multiplied by a function that models the reduced efficiency for detecting low signals, caused by the discriminator trigger in each detector. This function takes the form of a Gaussian cumulative density function in the logarithm of the signal, S, with threshold S thr and width σ thr . All this can be summarized in the following equations: where g(l | θ) and K(S | l, µ, σ, λ) are the track-length and tank-response functions shown in Figs. 4a and 4b, respectively. This model has six parameters: three in Eq. 4 (µ, σ, λ), and three in Eq. 6 ( N µ tank , S thr , σ thr ). At large distances, the electromagnetic contribution is expected to be small compared to the muon signal in the same tank and can thus be neglected. The result of this semi-analytical model is displayed in Fig. 4c. These figures show the signal distribution obtained using Geant4 [48] together with the prediction from the model. The two panels (top and bottom) of Fig. 4c correspond to two sets of parameters that differ only in their zenith angle (0°and 48.2°).

C. Electromagnetic signal distribution model
The probability distribution for signals in air showers includes the model described in Section IV B, which describes signals produced by muons, and a model for signals produced by gamma-rays, electrons and positrons in the electromagnetic (EM) component of the air shower. Hadrons can also produce a signal in the tanks, but they are so few in comparison to electrons, muons and gammarays, that their contribution is negligible.
Most electrons and positrons lose all their energy inside the tank. Their track lengths, and hence their signals, are typically smaller than those of muons. As a result, the distribution of signals from the EM component roughly mimics their energy distribution, with a mean signal that corresponds to a few tens of centimeters of track length inside the tank, while the distribution of signals from muons is determined by the geometry of the tank. We assume that the energy distribution of the EM component (and therefore their signal distribution) approximately follows a power law. This is true at large lateral distances, where the mean expected EM signal is much smaller than the threshold, S thr , leaving only the tail of the EM signal distribution above threshold. With this assumption, the EM signal distribution for tanks with no muons is given by where we included the same efficiency function that appears in Eq. 6. In Eq. 7 we have added the possibility of the c log (s) term to allow for small deviations from a power law. We thus have two different models for the distribution of EM signal, depending on whether or not we set the parameter c to zero. These two models will be referred to as EM1 and EM2, respectively. The analysis is repeated using both EM models in order to estimate the systematic uncertainty arising from the choice of model.

D. The effect of snow
The snow on top of each tank absorbs some of the energy carried by secondary particles, sometimes stopping them before they enter the tank. This raises the threshold energy above which the detection, filter, and selection are fully efficient, and attenuates the recorded signals. While the attenuation effect is included in the reconstruction process, the loss in trigger efficiency can not be corrected. This loss could affect the muon density measurement, since it depends on the air shower age, and thereby on the mass of the primary cosmic ray. Lower efficiency leads to a systematic decrease in the measured muon density.
The mechanical structures of the IceTop tanks and snow accumulation can directly affect the measurement of the muon density by reducing the number of muons that enter the tank if their energy is low enough. A muon that enters the tank and produces a signal smaller than about 0.7 VEM will not contribute to the measured muon density, as it would not contribute to the characteristic peak around 1 VEM in Fig. 2.
The results of detailed simulations of the signals produced by muons crossing vertically through IceTop detectors using the Geant4 package, which include the modeling of any detector effects, are shown in Fig. 5a. The (kinetic) energy threshold to produce a signal larger than 0.7 VEM for muons vertically entering a tank without snow is about 210 MeV. However, assuming realistic snow depths on top of the tanks of about 2 meters during the end of the data taking period, the energy threshold for muons in this analysis increases to about 420 MeV.
Typical muon density spectra from Corsika simulations, at 600 m and 800 m from the shower axis, for different primary cosmic ray energies, are shown in Fig. 5b. The fractions of muons with energies below 210 MeV and 420 MeV are about 2% and 10%, respectively. We therefore expect a systematic shift of this order in the reconstructed muon density. The exact magnitude of the shift depends on the primary energy, the arrival direction, and lateral distance because the muon spectrum depends on these variables. The different lines correspond to different ranges of S125. The mean energy for each S125 range is also shown.
A further complication is that the snow is spread unevenly over the array. Between May 2010 and May 2011, the tanks were covered by as little as a few centimeters and as much as 1.5 m of snow. Between May 2011 and May 2012 the snow coverage ranged from 0.4 m to 2 m of snow. The attenuation in the snow is one of several systematic effects that are accounted for by a correction factor that will be introduced in Section IV F. As a final verification, the yearly variation in the muon density is used as an estimate for the uncertainty due to snow in the final results (Section V).

E. Accidental coincidence background measurement
The rate of single-DOM discriminator triggers is 1470 Hz per tank. Most of these triggers are produced by low-energy showers which are independent of the main air shower event. They constitute a background in the present analysis. If we consider all triggers within a time window around an air shower event, there will certainly be some of these accidental signals, which will be distributed according to a different probability distribution than the signals from the shower, potentially biasing the measured muon density.
To minimize the effect of accidental signals, we select the signals based on their time difference with respect to the reconstructed shower front. The time window must be as small as possible to discard accidental signals, but large enough to guarantee that all air shower signals are selected. This selection is implemented assuming a plane shower front. Therefore, the time window must be large enough to account for the curvature of the shower front, and guarantee that air shower signals at large lateral distances are selected. Figure 6a shows histograms of the time difference using two different signal selection cuts. The time difference distribution exhibits a constant floor and a peak near zero, in coincidence with the reconstructed shower front. Based on this distribution, we define signal and background windows as follows: • signal window (in blue): −4.0 µs < dt < 0.5 µs • background window (in pink): 2 µs < dt < 9.5 µs The expected accidental background rate per signal bin in the signal window per tank is: where n pulses is the number of observed hits in each signal bin in the background window, n tanks is the number of detectors in the lateral distance bin considered, independent of whether there was a signal or not, ∆t B is the size of the background time window, ∆t S is the size of the signal time window, andn B is the number of expected hits in each signal bin per tank in the signal window. The relative contribution of the accidental coincidence background, in relation to all signals collected, is substantial at large lateral distances or for low-energy showers. This becomes important at the lowest energies around 2.5 PeV where showers are small and produce only few signals. Figure 6b is based on the background data and shows the fraction of the signals in the signal window expected to originate from background, as a function of lateral distance from the shower axis for several intervals in log 10 (S 125 ). The equivalent shower energy for each bin of S 125 is given in the legend. For reference, the lateral distances of 600 m and 800 m are indicated. For example, in the signal window of a 1.2 PeV shower the background amounts to 30% at 600 m lateral distance. If the background is not properly modeled, this would produce an upward bias of 50%. At 10 PeV, this fraction decreases to approximately 5%.
The advantage of measuring the accidental background in the same data sample used for the muon measurement is that it accounts for systematic effects under the same conditions. The background present in the signal window is estimated by measuring it in the background window. Both windows are equally affected by effects which can introduce systematic uncertainties, as discussed in Section V. Some of these effects can be seen in Fig. 7. The background rate decreases as snow accumulates on the detectors, and therefore the background rate decreases over the years. The most notable effect is the dependence on lateral distance. This dependence has its origin in the way snow is unevenly distributed over the array. The detectors are buried under different amounts of snow. Given a core location contained within the array and an arrival direction of the air shower, the effective amount of snow on the tanks is determined for in each radial bin.

F. Monte-Carlo correction
The accuracy of the muon density reconstruction depends on various assumptions used in this analysis. These include the assumption that the muons travel in the same direction as the air shower and the value of the mean zenith angle used in the signal model. There is also the effect of the absorption in the snow, as previously discussed, and selection bias occurs if the trigger or selection efficiency is not perfectly 100%. Other effects can arise from the finite resolution in the reconstructed parameters: energy, direction and core location.  The finite resolution in the reconstruction of the primary energy affects the muon density and becomes important in the presence of steep spectra. For example, both the finite energy resolution and a systematic shift in energy can cause migrations in the energy bins. These effects produce an apparent shift in the muon density relative to the density at the true energy and thus need to be accounted for. A systematic shift in the reconstructed zenith angle would slightly alter the signal response model. Any shift in core position or arrival direction will cause migrations in the radial bins, which can increase the contributions from the electromagnetic and muon components of the air showers, since they are steeply falling functions of the lateral distance. To estimate the sizes of the various effects, the analysis is performed on simulated datasets and the results are compared to the true muon density determined directly from the output of Corsika.
The raw reconstructed muon densities,ρ µ , at lateral distances of 600 m and 800 m, obtained from simulations based on Sibyll 2.1, are shown in Fig. 8 with respect to the true muon density in proton showers, ρ µ,p , as a function of the reconstructed air shower energy. The dotted lines in Fig. 8 indicate the results obtained using the EM1 model for the electromagnetic part, and the dashed lines represent the results obtained using the EM2 model. The markers are placed at the midpoint between the two.
In the following, we use the difference between the EM1 and EM2 result as an estimate of the systematic uncertainty arising from the assumed functional form for the electromagnetic signal distribution (see also Section V). The reconstructed muon distributions are corrected based on simulations which are evaluated in the same way as the experimental data.   Fig. 9. The grey band shows the resulting uncertainty which is accounted for in the final systematic uncertainties.
The correction is determined by dividing the reconstructed muon density obtained from simulations by the true muon density. The resulting ratios, using simulations based on the hadronic interaction models Sibyll 2.1, EPOS-LHC, and QGSJet-II.04, are shown separately in Fig. 9. The inverse of this ratio is used as a multiplicative factor to adjust the result in reconstructed experimental data. However, the correction factor clearly depends on the mass composition of the sample, since iron primaries require a larger correction. The actual composition is unknown, so the correction factor applied to the data will be the average of the proton and iron factors, with a systematic uncertainty of half of the difference. The actual composition is better known, but we are using this conservative approach for the uncertainty estimate. A linear fit to this average yields the correction factors for each hadronic model, depicted as black lines in Fig. 9. There are three contributions to the uncertainty in the correction factor: the electromagnetic signal model used (EM1/EM2), the assumed mass composition, and the statistical uncertainty in the fit. All these uncertainties appear as systematic uncertainties in the muon density.
The remaining uncertainty due to bin migration effects has been studied by comparing the corrected muon densities versus true and reconstructed observables in simulations. This has been done assuming three recent composition models in air shower simulations, commonly known as H3a [46], GST [54], and GSF [55]. The remaining uncertainty has been estimated to be at the sub-percent level and can be neglected considering the uncertainties due to the electromagnetic model, the mass composition assumption, and the statistical uncertainty of the fit. The hadronic interaction model can affect the reconstructed muon density in three ways. The first way is to influence the functional form of the electromagnetic signal distribution. The second way is to alter the fraction  11. Comparison between corrected muon density and true muon density obtained from simulated air showers using Sibyll 2.1. The logarithm of ρµ has been scaled such that the true value for proton is at zero and the true value for iron is at 1, as shown in Eq. 9. The oxygen and helium datasets are independent from the datasets used to develop the method. Their corresponding values correctly interpolate within the region of interest.
of muons below threshold. The third way is to produce muons with an angular distribution that differs significantly between the models. To account for these model dependencies of the correction factors, the muon densities are determined for each hadronic interaction model separately, as shown in Fig. 9.
In addition to the model-dependent results, the muon densities are also determined using an average correction factor, which is shown in Fig. 10. This average correction is obtained from the three ratios of the muon densities shown in Fig. 9 and a linear fit to this average yields the average correction factors, depicted as a black line.
The correction factor is calculated using only proton and iron primaries and the intermediate nuclei, helium and oxygen, can be used for validation purposes. It is expected that the muon density scales with the mass number, such that the logarithm of muon densities for helium and oxygen primaries should lie in between those of proton and iron. In Fig. 11, the logarithm of the mean muon densities, obtained from simulations using Sibyll 2.1, are scaled such that the proton value is at zero and the iron value is at one on the vertical axis for each model, which is often referred to as the z-value in the context of muon measurements. As expected, the logarithm of the reconstructed average muon densities for intermediate masses correctly interpolates linearly with log(A) between proton and iron.
V. RESULTS Figure 12 shows the mean muon density at 600 m from the shower axis for air showers with energies between 2.5 PeV and 40 PeV, and the mean muon density 800 m from the shower axis for air showers with energies between 9 PeV and 120 PeV. To produce these modeldependent results, the individual corrections shown in Fig. 9 have been used. Also shown are predictions of the mean muon density in proton and iron showers (solid lines), simulated with Corsika using the Sibyll 2.1, EPOS-LHC, and QGSJet-II.04 hadronic interaction models. In addition, the resulting modelindependent muon density, using the average correction from Fig. 10, is shown in Fig. 13. The brackets represent the systematic uncertainties, while statistical uncertainties are not visible in these figures. The systematic uncertainties are larger than the statistical uncertainties over the entire energy range. The distributions qualitatively agree with the naïve expectation that the mean mass of the primary cosmic rays becomes larger as the primary energy increases [1]. and 800 m (white squares) lateral distance after applying the corrections from Fig. 9, for the hadronic interaction models Sibyll 2.1, EPOS-LHC, and QGSJet-II.04. Error bars indicate the statistical uncertainty, brackets the systematic uncertainty. Shown for comparison are the corresponding simulated densities for proton and iron (red and blue lines). Tables of these data are available in a separate public data release [56].
The systematic uncertainty included in these distributions arises from four main sources which are depicted in Fig. 14a individually. The uncertainty in the energy determination causes a systematic uncertainty in the muon density because of the correlation between shower energy and muon number. While the effect of bin migrations due to the finite resolution of the reconstructions is covered by the Monte-Carlo correction described in Section IV F, a systematic shift of the absolute energy scale can also cause an apparent discrepancy in muon density and needs to be considered. The associated uncertainty Tables of these data are available in a separate public data release [56].
in the muon density has been determined based on the uncertainties of the cosmic ray flux reported by IceTop in Ref. [40]. The individual flux uncertainties are added in quadrature and the corresponding energy uncertainty is calculated assuming a power law with spectral index 2.7 for the cosmic ray flux. Assuming the uncertainty in muon density is directly proportional to the uncertainty in the energy scale 1 , the resulting uncertainty in muon density has been determined to be approximately 7.5%, as shown in Fig. 14a. The other three sources were already discussed, namely the functional form in the electromagnetic signal distribution described in Section IV C (orange lines in Fig. 2), the mass composition assumed to derive the correction shown in Fig. 9, and the statistical uncertainty when fitting the correction, both discussed in Section IV F. The effect of the EM model assumption enters in two different places, when reconstructing the muon density in data, and when reconstructing the muon density in simulations in order to derive the correction factor. However, these two effects are correlated. The difference for the two EM models in the reconstructed muon density are the same size in simulations and in experimental data.
The snow is corrected for in a time-dependent way by dividing the data sample in subsamples corresponding to campaign years. The mean muon density in each subsample is compared to the main result. The difference between the muon density from yearly samples and the main result is shown in Fig. 14b  of 600 m we find differences between -2% and +4% with respect to the average muon densities. At 800 m, no systematic effect is detectable, because the statistical scatter is much larger than any inter-year effect. Thus, the results for the muon densities are consistent with being independent of the detector configuration.
The detection efficiencies for proton and iron showers can be different, in particular near the threshold at low energies around 2.5 PeV. This selection bias would in turn introduce a systematic shift of the measured muon densities: a selection that prefers protons, for example, lowers the muon density at a given true primary energy. This effect, however, would be visible as a timedependent shift of the muon densities in the threshold region because the snow correction applied during reconstruction accounts for the accumulation effect but not for a loss in efficiency. Since no significant time-dependent trend in the muon densities from the subsamples for each year is observed the potential bias due to mass-dependent detection efficiencies can be neglected.
As the Monte-Carlo correction depends on the underlying hadronic interaction model, the uncertainty due to the mass composition assumed to derive the correction, as well as the statistical uncertainty when fitting the correction, need to be considered for each model separately. The corresponding uncertainties for the average correction factor shown in Fig. 10, which are used to derive the result in Fig. 13, are shown in Fig. 14a (Correctionstat./prim.). The corresponding uncertainties due to the individual correction depicted in Fig. 9, which are used to derive the results in Fig. 12, are shown in Fig. 15a. Also shown again are the uncertainties for the average correction (black lines). For the average correction factor, we also derive a systematic uncertainty due to the hadronic model assumption. Figure 15b shows the variation of the muon density derived from the model-dependent correction with respect to the average result from Fig. 13. In the post-LHC models EPOS-LHC and QGSJet-II.04, the muon density is larger than in Sibyll 2.1 and at a lateral distance of 600 m we find differences up to about 10%. The largest difference between the average result and the individual models is accounted for as a systematic uncertainty, also shown in Fig. 14a (Correction -hadr.). All uncertainties are added in quadrature and are shown as brackets in the main results. A comparison between model predictions and experimental data is displayed in Fig. 16, showing the ratio of reconstructed muon density to expected muon density in simulated proton showers. The consistency of the hadronic interaction models can be examined by checking if the data are bracketed by the expectations given by a cosmic ray flux with 100% proton (red line) and 100% iron (blue lines). In this sense QGSJet-II.04 and Sibyll 2.1 perform fairly well, since the corresponding proton and iron lines bracket the data. For QGSJet-II.04, the data between 2.5 PeV and about 10 PeV is very close to the expectation for a 100% proton flux. The EPOS-LHC lines do not bracket the data at the lowest energies, requiring an average composition slightly lighter than proton below approximately 6 PeV. The cosmic ray composition has been previously measured in this energy range [40][41][42] and various cosmic ray flux models consistent with existing experimental data are available [46,54,55]. The comparison between the measured muon densities and predictions based on different cosmic ray flux models is displayed in Fig. 17  arithms of the muon densities are again scaled such that the proton value is at zero and the iron value is at one on the vertical axis, the same as in Fig. 11 and as described in Eq. 9 (z-value). The expected z-values according to three recent composition models commonly known as H3a [46], GST [54], and GSF [55] are shown as lines. While for H3a and GST only the best fit curve is provided, for the GSF model also the entire covariance matrix of the fit is available. This allows the calculation of the uncertainty in the expected muon density, which is shown as a grey band in Fig. 17. While the shape of the muon density distributions agrees fairly well within uncertainties, the main difference between data and model expectations is the normalization of the muon densities. The mean model expectation according to Sibyll 2.1, assuming the GSF flux model and its uncertainty band (which covers H3a and GST predictions), is close to the data over the entire energy range. However, the post-LHC models EPOS-LHC and QGSJet-II.04 yield higher muon densities that are in tension with the measurement, assuming any realistic cosmic ray flux model.

VI. CONCLUSIONS
We have presented measurements of the muon density in air showers at lateral distances of 600 m and 800 m for cosmic ray energies from 2.5 PeV to 40 PeV and 9 PeV to 120 PeV, respectively, which are shown in Fig. 13. We have compared these results with expectations assuming a realistic primary composition and using the hadronic interaction models Sibyll 2.1 (pre-LHC), EPOS-LHC, and QGSJet-II.04 (both post-LHC).
For low-energy muons (∼ 1 GeV) from air showers with energies above about 1 EeV (HiRes-MIA, Pierre Auger Observatory, NEVOD-DECOR), the measured muon densities are always higher than the predictions from simulations and could not be accounted for even by LHC-tuning the models. In contrast, for the muon densities reported in this work, the post-LHC models predict too large average muon densities over the entire energy range up to about 100 PeV, assuming realistic cosmic ray flux models consistent with existing experimental data. The large muon density in the models correspondingly yields very light mass compositions when the experimental data are interpreted using these models. At around 100 PeV, the muon density is consistent with that of a mixed composition. This is in agreement with the measurement done with the EAS-MSU surface detector which is consistent with a 57% proton fraction in a naïve pro-ton+iron model interpreted with QGSJet-II.04 [30].
Within the pre-LHC baseline model Sibyll 2.1, the expectations based on realistic flux models are close to the measurement of low-energy muon densities reported here. This observation is consistent with recent measurements of high-energy muons in IceCube's deep ice detector (E µ 300 GeV) in the primary energy region between 2.5 PeV and 1 EeV [41,57,58]. The post-LHC models, however, yield lower muon densities for any given mass for the high-energy muons measured in IceCube, in contrast to the comparison for low-energy muon densities. This suggests that the muon densities in post-LHC models are not correct for primary energies below approximately 100 PeV.
However, a systematic shift in the reconstructed primary energy can also cause an apparent discrepancy in  and 800 m (white squares) lateral distance after applying the corrections from Fig. 9, normalized to the muon density obtained from proton simulations. Error bars indicate the statistical uncertainty, brackets the systematic uncertainty. The points are horizontally displaced slightly for better visibility. Tables of these data are available in a separate public data release [56]. muon density. A 20% shift in the energy scale, for example, translates to an apparent shift of about 0.5 in the z-value, defined in Eq. 9 and shown in Fig. 17. For this reason, a detailed comparison of results from multiple observatories across all energies is needed to understand the production of GeV muons in air showers [59,60]. This comparison is beyond the scope of this work, since it needs to account for the systematic effects of each detector, including zenith angle range, lateral distance, and systematic uncertainty in the absolute energy scale [61]. The method of measuring the density of muons in air showers as described here can be extended to a larger phase space. The current version focuses on mostly ver- Measured muon densities at 600 m (solid circles) and 800 m (white squares) compared to predictions from different hadronic models. These figures show the muon density scaled such that log ρµ for proton is 0 and for iron is 1. The lines display the mean muon density according to the three cosmic ray flux models H3a [46], GST [54], and GSF [55]. The grey bands show the uncertainity of the GSF model. The differences between models and data indicate a discrepancy in the simulated muon densities in post-LHC models. The points are horizontally displaced slightly for better visibility. Tables of these data are available in a separate public data release [56]. tical events and by extending it to a larger zenith angle range we could compare with the measurements of air shower attenuation done with KASCADE-Grande [33]. As the KASCADE-Grande experiment has measured the muon fluxes at sea level while IceTop is at an atmospheric depth of about 690 g/cm 2 , this measurement would allow a dedicated comparison of the muon attenuation in dependence of the mass overburden. In addition, by studying a larger range of lateral distances, instead of restricting it to the interpolated values at 600 and 800 m, it would be possible to study deviations from the muon lateral distribution function given by simulations. This can shed light on systematic differences between detectors that measure muons at different lateral distances. Systematic studies of the low-energy interaction model and the transition to the high-energy regime will provide information about muon production at low energies.
This work creates more opportunities for further studies. The signal probability distribution models used to estimate the mean muon density in this work can be used on an event-by-event basis, as a likelihood function, to estimate the muon content in individual air showers. In addition, further studies of the mean muon density and its correlation with the muon bundle multiplicity in the deep portion of IceCube can provide unique information and strong constraints on hadronic interaction models. These coincident measurements would relate to the energy spectrum of the muons in air showers since the deep part of IceCube has an energy threshold of several hundred GeV, as opposed to IceTop's threshold of a few hundred MeV. These analysis improvements and future measurements will be crucial to further understand the production of GeV muons in air showers in the primary energy range from 2.5 PeV to 100 PeV and higher.