Measurement of the azimuthal anisotropy of charged-particle production in Xe + Xe collisions at √ s NN = 5 . 44 TeV with the ATLAS detector The ATLAS Collaboration

This paper describes the measurements of flow harmonics v2–v6 in 3 μb−1 of Xe+Xe collisions at √ sNN = 5.44 TeV performed using the ATLAS detector at the LHC. Measurements of the centrality, multiplicity and pT dependence of the vn obtained using two-particle correlations and the scalar product technique are presented. The measurements are also performed using a template-fit procedure, which was developed to remove non-flow correlations in small collision systems. This non-flow removal is shown to have a significant influence on the measured vn at high pT, especially in peripheral events. Comparisons of the measured vn with measurements in Pb+Pb collisions and p+Pb collisions at √ sNN = 5.02 TeV are also presented. The vn values in Xe+Xe collisions are observed to be larger than those in Pb+Pb collisions for n = 2, 3 and 4 in the most central events. However, with decreasing centrality or increasing harmonic order n, the vn values in Xe+Xe collisions become smaller than those in Pb+Pb collisions. The vn in Xe+Xe and Pb+Pb collisions are also compared as a function of the mean number of participating nucleons, 〈Npart〉, and the measured charged-particle multiplicity in the detector. The v3 values in Xe+Xe and Pb+Pb collisions are observed to be similar at the same 〈Npart〉 or multiplicity, but the other harmonics are significantly different. Comparisons of the vn measurements with theoretical calculations are also made.


Introduction
Heavy-ion collisions, such as those at the Relativistic Heavy Ion Collider (RHIC) [1][2][3][4] and at the Large Hadron Collider (LHC) [5][6][7][8][9][10], produce a novel state of matter, with deconfined quarks and gluons, commonly called quark-gluon plasma (QGP). The QGP produced in such collisions expands anisotropically due to spatial anisotropies in the initial geometry, which produce asymmetric pressure gradients between the medium and the outside vacuum. The anisotropic expansion leads to large azimuthal anisotropies in the final distributions of the produced particles [11]. The single-particle azimuthal yields of particles produced in heavy-ion collisions are typically characterized as a Fourier series [12] : where φ is the azimuthal angle of the particle momentum, and the v n and Φ n are the magnitude and phase of the n th -order anisotropy. The v n are referred to as flow harmonics, while the Φ n are referred to as event-plane angles. The v n are functions of the transverse momentum (p T ), pseudorapidity 1 (η), event-multiplicity and particle species, and fluctuate from event to event.
Measurements of the v n and their comparisons with calculations based on relativistic hydrodynamics have shown that the QGP produced in heavy-ion collisions behaves like a nearly perfect fluid, characterized by a very low ratio of shear-viscosity to entropy density, η/s, close to the conjectured lower limit of /4πk B [13]. Significant experimental progress has been made in recent years in precision measurements of the v n [5][6][7][8]14, 15], event-by-event fluctuations in the v n [16 -18] and the correlations between the magnitudes [19, 20] and phases [21] of different-order anisotropies. These high-precision measurements have led to significant improvement in constraining the value of η/s [22,23]. However, due to large theoretical uncertainties in the understanding of the initial stages of heavy-ion collisions, the precise value of η/s or its exact temperature dependence still remains unknown. A possible data-driven way of further constraining the value of η/s, and at the same time improving the understanding of the initial entropy production in heavy-ion collisions, is to compare the v n measured across collision systems of different sizes [24]. This paper presents v n measurements in Xe+Xe collisions at a center-of-mass energy per nucleon pair of √ s NN = 5.44 TeV by ATLAS using an integrated luminosity of 3 µb −1 . The flow measurements are performed with the two-particle correlation (2PC) and scalar product (SP) methods. The measurements are also performed using a template-fit procedure [25,26] developed by ATLAS to measure correlations in small systems -such as pp and p+Pb. The template-fit procedure removes 'non-flow' correlations that arise from back-to-back jet pairs (dijets) and particle decays, which typically bias the v n measurements in low-multiplicity events, especially at high p T .
The Xe nucleus is small compared to the Pb nucleus -the specific isotopes used at the LHC are 129 Xe 54 and 208 Pb 82 . Thus Xe+Xe collisions are expected to have larger event-by-event fluctuations in the initial geometry compared to Pb+Pb collisions [24]. On the other hand, a smaller system implies larger viscous effects in the hydrodynamic expansion of the produced QGP fireball [24,27,28]. Thus the v n measurements in Xe+Xe collisions, and their comparison with those in Pb+Pb collisions allow the interplay of these two effects to be studied.

Dataset, event and track selections
The Xe+Xe data used in this paper were collected in October 2017. A set of minimum-bias events was selected by a pair of complementary and mutually exclusive triggers. The first trigger required the total transverse energy deposited in the calorimeters at L1 (E L1 T ) to be larger than 4 GeV, without any additional requirement at the HLT. The second trigger required that E L1 T be less than 4 GeV with the additional requirement of a reconstructed track with p T > 0.2 GeV at the HLT. Together these two triggers selected all events with either E L1 T > 4 GeV or a reconstructed track at the HLT. In the offline analysis, the z-coordinate of the primary vertex [33] is required to be within 10 cm of the nominal interaction point. Events containing more than one inelastic interaction (pileup events) were removed by exploiting the correlation between the transverse energy measured in the FCal (ΣE FCal T ) and the number of tracks associated with the primary vertex. In a typical pileup event the track multiplicity associated with the primary vertex belongs to a single Xe+Xe collision, while the energy deposited in calorimeters contains contributions from multiple collisions. Therefore, events with small values of measured multiplicity and large ΣE FCal T , which differ markedly from those of the majority of Xe+Xe collisions, are removed from the analysis [17]. The fraction of pileup events is estimated to be ∼0.1%. As in previous ATLAS heavy-ion analyses, the events are classified into centrality percentiles based on the total transverse energy deposited in the FCal in the event [5,14]. The Glauber model [34] is used to obtain a correspondence between the ΣE FCal T distribution and the sampling fraction of the total inelastic Xe+Xe cross-section, allowing the setting of the centrality percentiles [5,14]. The Glauber model is also used to obtain the mapping from the observed ΣE FCal T to the primary properties, such as the mean number of nucleons participating in the nuclear collision, N part , for each centrality interval. Figure 1 shows the distribution of ΣE FCal T in data and thresholds for the selection of several centrality intervals. For the centrality dependence study, this analysis is restricted to the 0-80% most central collisions where the triggers are fully efficient.  Charged-particle tracks are reconstructed from the signals in the ID. A reconstruction procedure developed for tracking in dense environments in pp collisions, and optimized for heavy-ion collisions, is used for this purpose [35]. In the analysis, the set of reconstructed tracks is filtered using several selection criteria. For the nominal selection, the tracks are required to have p T > 0.5 GeV, |η| < 2.5, at least two pixel hits, with the additional requirement of a hit in the IBL when one is expected, at least eight SCT hits, and no missing hits in the pixel or SCT. A hit is expected if the extrapolated track crosses an active region of a silicon-sensor module (pixel or SCT) that has not been disabled, and a hit is said to be 'missing' when it is expected but not found. Further, the transverse (d 0 ) and longitudinal (z 0 sin(θ)) impact parameters of the track relative to the primary vertex are required to be less than 1 mm. The track-fit quality parameter χ 2 /ndof is required to be less than 6. For cross-checks and for estimating systematic uncertainties a looser and a tighter set of requirements are used. For the looser selection, referred to as 'loose', the requirements on the number of pixel and SCT hits are reduced to one and six, respectively, and the requirement on d 0 and z 0 sin θ are relaxed to 1.5 mm. These looser requirements on the tracks are also used to study the multiplicity dependence of the v n as they enhance the per-event multiplicity. For the tighter selection, referred to as 'tight', the topological requirements on the reconstructed track are not altered, but the transverse and longitudinal impact parameters of the track are restricted to be less than 0.5 mm.
In order to study the performance of the ATLAS detector, a sample of 1M minimum-bias Xe+Xe Monte Carlo (MC) events was generated using HIJING version 1.38b [36]. Since the HIJING event-generator does not have any intrinsic mechanism to generate flow, the latter is added after the generation using an 'afterburner' procedure [37], which slightly shifts the φ positions of generated particles to mimic flow. The generated sample was passed through a full simulation of the ATLAS detector [38] using Geant4 [39], and the MC events were reconstructed by the same algorithms as the data. The reconstructed particles in the MC events are used to calculate the reconstruction efficiency -the fraction of the generated charged particles that are successfully reconstructed -as a function of p T and η, and denoted by (p T , η) below. At mid-rapidity (|η| < 1) and for events in the 0-5% centrality interval, the reconstruction efficiency is ∼60% at low p T and increases to ∼73% at higher p T . For |η| > 1 the efficiency decreases to about 40-60% depending on the p T and centrality. The reconstruction efficiency depends weakly on the centrality for low-p T tracks, for which it is smaller in the most central events by about 3% as compared to peripheral events. For tracks with p T > 1 GeV the dependence on centrality is less than 1%. The MC simulation is also used to calculate the fraction of fake tracks -the fraction of the reconstructed particles that do not correspond to any generated particle -also as a function of p T and η, and denoted by f (p T , η) below. The fake rate is less than 1% across the p T and centrality range used in this analysis. Additionally, systematic differences seen between the v n evaluated using the reconstructed particles in the MC events -using the same techniques as in the data analysis -and the v n implemented for the generated particles, are used as multiplicative correction factors in the data analysis. The size of this correction is discussed in Section 5.
The Xe+Xe measurements in this paper are compared with similar measurements in Pb+Pb and p+Pb collisions at √ s NN = 5.02 TeV. Further, for the template fits, data from pp collisions at √ s = 5.02 TeV are used [26] (see Section 4.2). The Pb+Pb dataset, triggers and track selections used for the comparisons are identical to those used in Ref. [6]. The pp and p+Pb datasets, triggers and track selections are identical to those used in Ref. [26].
where S and B represent the 'same-event' and 'mixed-event' pair distributions, respectively [14]. The same-event distribution includes both the physical correlations and the correlations arising from detector acceptance effects. On the other hand, the mixed-event distribution reflects only the effects of detector inefficiencies and nonuniformity, but contains no physical correlations. Detector acceptance effects largely cancel out in the S /B ratio [40]. To ensure that the acceptance effects in the B distribution closely match those in the S distribution, the B distribution is constructed from particles from two different events that have similar centrality (or multiplicity) and z-vertex. When constructing S and B, in order to account for the inefficiency in track reconstruction and for misreconstructed tracks, pairs are weighted by their fake-track rates and the inverse product of their reconstruction efficiencies (see Section 3): To investigate the ∆φ-dependence of the long-range (|∆η| > 2) correlation in more detail, the S and B distributions are integrated over 2 < ∆η < 5 and a one-dimensional correlation function C(∆φ) is constructed as follows: The |∆η| > 2 requirement is imposed to reject the short-range correlations and focus on the long-range features of the correlation functions. In a similar fashion to the single-particle distribution (Eq. (1)), the C(∆φ) can be expanded as a Fourier series [14]: In the absence of any non-flow correlations, it can be shown that the Fourier coefficients of the C(∆φ) factorize into the product of single-particle anisotropies as [40]: The factorization of v n,n given by Eq. (3) is expected to break down at high p T and in low-multiplicity events, where the v n,n measurements are biased by non-flow correlations [6,14]. The factorization is also expected to break down when the η separation between the particles is small, and short-range correlations dominate [14]. However, the |∆η| > 2 requirement removes most of such short-range correlations. In the phase-space region where Eq. (3) holds, the v n (p b T ) can be evaluated from the measured v n,n as: where the relation v n,n (p a T , p a T ) = v 2 n (p a T ) is used in the denominator. For most of the 2PC results in this analysis the v n (p b T ) are evaluated using Eq. (4) with 0.5 GeV < p a T < 5 GeV. The upper limit on p a T is chosen to exclude high-p T particles, which originate predominantly from jets.

Template fits
One drawback of the 2PC method is that in peripheral events with low multiplicity or for 2PCs involving particles at high p T , the measured v n can be biased by correlations arising from back-to-back dijets that are not rejected by the |∆η| > 2 requirement. This issue is much more severe in smaller collision systems such as p+Pb and especially in pp collisions, in which the 2PC even at large |∆η| is completely dominated by the back-to-back dijet correlations. To address this issue, a template-fitting procedure was developed to measure the v n in pp and p+Pb collisions [25,26,43]. The template-fit method assumes that: 1. The shape of the dijet contribution to the 2PC does not change from low-to high-multiplicity events, only its relative contribution to the 2PC changes.
2. The 2PC for low-multiplicity events is dominated by the dijet contribution.
With the above assumptions, the correlation C(∆φ) in higher multiplicity events is then described by a template fit, C templ (∆φ), consisting of two components: a scale factor, F, times the correlation measured in low-multiplicity events, C periph (∆φ), which accounts for the dijet-correlation, and a genuine long-range harmonic modulation, C ridge (∆φ): where the coefficient F and the v n,n are fit parameters adjusted to reproduce the C(∆φ). The coefficient G is not a free parameter but is fixed by the requirement that the integrals of the C templ (∆φ) and C(∆φ) are equal. There are two variations of the template-fit procedure, depending on how the C periph (∆φ), used in Eq. (5), is obtained. In the first case, the entire correlation function in the low-multiplicity interval is used as C periph (∆φ). In the second case, only the modulated part of the correlation function is used as C periph (∆φ) and the unmodulated 'pedestal' is removed. The removal of the pedestal is done using the 'zero yield at minimum' (ZYAM) procedure [25], which subtracts a pedestal from the C periph (∆φ) such that the value of C periph (∆φ) is zero at its minimum. The ZYAM-based method includes the additional assumption that only the part of the low-multiplicity 2PC that is modulated in ∆φ arises from 'non-flow' correlations. In general, the standard and ZYAM-based template-fit measurements in pp and p+Pb collisions give significantly different v n values at very low multiplicities. But with increasing multiplicities, the difference between the v n values obtained from the two methods decreases [25,26]. Since the Pb+Pb and Xe+Xe multiplicities are significantly larger than in pp and p+Pb it is expected that the standard and the ZYAM-based template measurements yield similar v n values, except at very low multiplicities. The difference between the v n values obtained with/without the ZYAM procedure then gives an estimate of the bias that the non-flow subtraction technique induces in the measured v n .
In this paper, the C periph (∆φ) used in the template fits is constructed using pp events at √ s = 5.02 TeV that have fewer than 20 reconstructed tracks passing the selection requirements listed in Section 3. Prior measurements using the template-fit method in pp and p+Pb collisions used the low-multiplicity events from the same collision system to generate the C periph (∆φ). In the present analysis the choice of pp reference is based on the following reasoning. The template-fit method works better when the C periph (∆φ) is dominated by non-flow correlations. In heavy-ion collisions there are significant flow correlations present at the lowest multiplicities. Thus using the C periph (∆φ) constructed from pp collisions at similar collision energy, which has smaller flow-like correlations, is a better alternative. Because of 'jet quenching' effects present in heavy-ion collisions [44][45][46][47], it is possible that the assumption made in the template-fit procedure, that the shape of the dijet correlation in ∆φ does not change from low-to high-multiplicity events, may not be valid. The effects of jet quenching may bias the v n measured using the template-fit method.

Scalar product
While the 2PC method relies on correlations between particle pairs using only information from the ID to measure v n , the SP [48,49] measurement relies on correlations between the flow from energy deposits measured in the FCal and from tracks in the ID. Thus it allows measurements of the v n with a larger gap in η to strongly suppresses short-range correlations. In fact, the larger η gap not only suppresses short-range correlations, it also suppresses correlations from back-to-back dijets, as most of the dijets are at mid-rapidity.
The SP measurement is based on the construction of flow-vectors [48,49] from tracks in the ID, and towers -segmentations of the calorimeter of approximate granularity 0.1 × 0.1 in η and φ -in the FCal. The flow-vectors are defined as: 2 where n is the harmonic order. For the construction of k k k n from ID tracks, which is labeled as q q q n in the following text, φ j is the azimuthal angle of the track, and the weight w j , which corrects for tracking performance, is equal to ( The sum runs over a set of particles in a single event, usually restricted to a region of the η-p T space. For the estimate of k k k n from the FCal, denoted by Q Q Q n , the sum runs over the calorimeter towers, with φ j being the azimuthal position of the tower, and the weight w j being the measured E T in the tower. Using Eq. (1), it can be shown that the flow-vectors q q q n are given by v n (p T , η)e inΦ n . However, due to statistical fluctuations arising from the finite number of tracks used in measuring the q q q n event-by-event [16], the measured q q q n fluctuate around the true v n (p T , η)e inΦ n , and can be written as: q q q n = v n e inΦ n + q q q fluc n , where q q q fluc n is a complex number representing event-by-event statistical fluctuations. Similarly the Q Q Q n can be written as: where V n is used to denote the integrated v n of particles in the calorimeter acceptance together with the E T response of the calorimeter folded in, and Q Q Q fluc n represents statistical noise. Due to the random orientation of the collision geometry from event to event, the flow-vector averaged over many events should be equal to zero. Additionally, the distributions for the real and imaginary parts of the flow-vector should have identical widths. However, due to nonuniform detector response in φ, these conditions may not be satisfied. To correct for the nonuniform detector response, a procedure described in Ref. [6] is applied, which ensures that the distributions of the real and imaginary parts of the flow-vectors are centered at zero and have the same widths.
In the estimation of v n from the SP method, four flow-vectors are involved: Q Q Q P n measured in the FCal at positive η, Q Q Q N n measured in the FCal at negative η, and the corresponding flow-vectors for charged particles measured in the ID, denoted by q q q P n for positive η and q q q N n for negative η. The v n for η < 0 from the SP method is then defined as while for η > 0 the numerator is replaced by q q q P n Q Q Q N * n . The '*' denotes complex conjugate and the angular brackets indicate an average over all events. Equation (8) can be understood by substituting expressions from Eq. (6) and Eq. (7) for the flow-vectors, and noting that all terms involving q q q fluc n and Q Q Q fluc n drop out when averaged over many events. This gives: where the superscripts N and P indicate whether the quantities involved correspond to η < 0 or η > 0, respectively. Equation (9) is similar to Eq. (4) for the 2PC measurements. The final v n from the SP method is obtained by averaging the results obtained for η > 0 and η < 0 in Eq. (8). While Eq. (8) explicitly uses the real part of the flow-vector product to obtain the v n , the imaginary part of the flow-vector product should be statistically consistent with zero (cf. Eq. (9)). Any statistically significant nonzero value for the imaginary part of the flow-vector product in Eq. (8) is indicative of detector response effects that are not corrected for in the measurement, and is typically used as a systematic uncertainty of the measured v n .
The 2PC and SP methods are very closely related; both methods nominally measure the v 2 n , where the average is taken over all particles and events in the chosen p T , η and centrality range. The 2PC method uses correlation between pairs of tracks while the SP method utilizes correlations between tracks and energy deposition in the calorimeters. One advantage of the SP method in ATLAS is that the larger rapidity gap between the FCal and the ID, which is greater than 3.2 units, leads to larger suppression of non-flow correlations as compared to the 2PC method, where the minimum gap between the reference and associated tracks is 2 units. However, the SP method (in ATLAS) has some disadvantages: for example, the larger η gap biases the v n measurements if longitudinal flow decorrelations are present [50][51][52]. The flow decorrelations increase with decreasing system-size [50][51][52], and would affect flow measurements for Xe+Xe more than for Pb+Pb. Additionally, for very low multiplicities, it is very difficult to obtain reliable flow-vectors using the calorimeters. This is the reason why for small systems (pp and p+Pb) the flow measurements are typically performed using the 2PC or template-fit methods [25,26,43,53,54].

Systematic uncertainties
The systematic uncertainties of the measured v n are evaluated by varying several aspects of the analysis. Most of the uncertainties are common to the SP and 2PC methods and are discussed together. Since the template-fit measurements start with the 2PCs, all systematic uncertainties related to the 2PC method also affect the measurements based on the template fit. The template-fit measurements have an additional uncertainty related to the choice of the peripheral reference, which is discussed below. The SP method is used to measure harmonics v 2 -v 6 , while the 2PC and template-fit methods are used to measure harmonics v 2 -v 5 . This difference is due to the fact that the 2PC and template-fit measurements have large systematic uncertainties attributed to the pair acceptance and peripheral reference choice for v 6 , which make a significant measurement not feasible. The uncertainties for two representative centrality and p T ranges are summarized in Table 1. The following sources of uncertainties are considered: • Track selection: The tracking selection cuts control the relative contribution of genuine charged particles and fake tracks entering the analysis. The stability of the results with respect to the track selection is evaluated by varying the requirements imposed on the reconstructed tracks, and including the resulting variation in the v n as a systematic uncertainty. The two sets of variations termed loose and tight in Section 3 are used for this purpose. At low p T (0.5-0.8 GeV) the variation in the v n obtained from this procedure is most significant in the most central events, typically 5%, as the fake-track rate is largest in this region of phase space. For higher p T and less central events, changing the set of tracks used in the analysis has less influence on the measurement.
• Tracking efficiency: As mentioned in Section 4, the tracks are weighted by (1 − f )/ when calculating the v n to account for the impact of the tracking efficiency. Uncertainties in the efficiency, resulting from e.g. uncertainty of the amount of material in the detector, are propagated into the measured v n . This uncertainty is evaluated by varying the efficiency up and down within its uncertainties (∼ ±3%) in a p T -dependent manner and re-evaluating the v n . This contribution to the overall uncertainty is very small and amounts to less than 1% on average for the p T -integrated v n , and is negligible for the p T -differential v n . This is because the change of efficiency largely cancels out in the differential v n (p T ) measurement, and for v n integrated over p T , the low-p T particles dominate the measurement. The uncertainty does not change significantly with centrality or with the harmonic order.
• Centrality determination: The centrality definitions used to classify the events into centrality percentiles have a ∼1% uncertainty associated with them. This uncertainty arises from uncertainties in the fraction of the inelastic Xe+Xe cross-section accepted by the triggers used in this analysis. The impact of the uncertainty from the centrality definition on the v n is evaluated by varying the centrality interval definitions by 1%, re-evaluating the v n and including the variation in the v n as a systematic uncertainty. The impact on all harmonics over the 0-50% centrality range is found to be within 1%. For more-peripheral events, this number varies between 1% and 5% depending on the p T , centrality and harmonic order n.
• MC closure: The MC closure test consists of comparing the v true n obtained directly from the MC generated particles, and the v reco n obtained by applying the same analysis procedures to the MC sample as to the data. The analysis of MC events is done to evaluate the contributions of effects not corrected for in the data analysis. Systematic differences seen between the v n of the generated particles and reconstructed particles are used to correct the v n measured in the data and, conservatively, also included as a systematic uncertainty. This uncertainty is at the level of a few percent over the 0.5-0.8 GeV p T range and the 0-20% centrality range, and reaches 5% for p T ∼ 0.5 GeV in the 0-5% centrality interval. It is negligible elsewhere.
• Event-mixing: As explained in Section 4.1, the 2PC analysis uses the event-mixing technique to estimate and correct for the detector acceptance effects. Potential systematic uncertainties in the v n due to the residual pair-acceptance effects, which are not corrected for by the mixed events, are evaluated by varying the multiplicity and z-vertex matching criteria used to make the mixed-event distributions, following procedures from Ref. [14]. The resulting uncertainty in the v 2 -v 5 is 1-3% for most of the centrality and p T ranges considered in this paper. However, at high-p T and for the harmonic orders n ≥ 4, this uncertainty can become at large as 10%. This uncertainty only contributes to the v n values measured by the 2PC and template-fit methods.
• Choice of peripheral reference: The template-fit procedure uses pp events at √ s = 5.02 TeV with less than 20 reconstructed tracks to build C periph (∆φ). The choice of 20 tracks is partially motivated by the fact that the mean multiplicity of minimum-bias pp events is close to 20 tracks. To test the stability of the v n with respect to our choice of the peripheral reference, the analysis is repeated with an alternative C periph (∆φ) constructed from pp events with 0-20, 10-20 and 10-30 reconstructed tracks and the change in the template-v n values is included as a systematic uncertainty. This uncertainty is within ∼4% over the 0-50% centrality range and for p T < 4 GeV, but increases considerably and can become as large as 30% for more-peripheral events or at higher p T . This uncertainty only contributes to the v n values measured by the template-fit method.
• η asymmetry: Due to the symmetry of the Xe+Xe collision system the event-averaged v n (η) and v n (−η) are expected to be equal. Any difference between the event-averaged v n at ±η arises from residual detector nonuniformity. The difference between the v n values measured in opposite hemispheres is treated as a systematic uncertainty quantifying imperfect detector performance. This uncertainty only contributes to the v n values measured with the SP method. The value of this uncertainty depends on the harmonic order. It is less than 1% for v 2 and increases to ∼5% for v 6 . For the 2PC method, the residual nonuniformity is estimated by variations in the event-mixing procedure.
• Residual sine term: The imaginary part of the flow-vector product in Eq. (8) should be consistent with zero. Any systematic deviation of the imaginary part from zero indicates the presence of residual detector response effects. The ratio of the imaginary part of the flow-vector product to its real part (Eq. (8)) is therefore included as a contribution to the systematic uncertainty. The contribution from this source is ≤ 2% in most of the phase space, while for the higher harmonics (n = 5) it can reach 10%. This uncertainty is only relevant for the v n values measured by the SP method.
6 Results 6.1 Two-particle correlations and template fits Figure 2 shows the two-dimensional 2PCs in ∆η-∆φ for several centrality intervals for 2 GeV < p a,b T < 3 GeV, where p a T and p b T label the p T ranges of the reference and associated particles used in the correlation. In all cases a peak is seen in the correlation at (∆η, ∆φ) ∼ (0, 0). This peak arises from short-range correlations such as jet-fragmentation, resonance decays or Hanbury Brown and Twiss (HBT) correlations [55]. The long-range (large ∆η) correlations are the result of the global anisotropy of the event and are the focus of the study in this analysis. Thus, all further 2PC results are discussed with the ∆η > 2 requirement. Figure 3 shows corresponding one-dimentional 2PCs along ∆φ for 2 GeV < p a,b T < 3 GeV and for several centrality intervals. The line is a Fourier fit to the correlation (Eq. (2)) that includes harmonics up to n = 5. The y-axis ranges for the different panels are kept identical so that the modulation in the correlation across the different centralities can be easily compared. It is seen that the modulation in the correlation is smallest in the most central events (left panel), increases towards mid-central events and then decreases slightly. This roughly follows the centrality dependence of most v n,n , especially the v 2,2 . Table 1: The relative contributions (in percent) to the systematic uncertainty of v n in two selected bins of centrality. The contributions are expressed as a percentage of the measured v n and are rounded up to two significant digits. Items 1-4 are common to all methods. Item 5 is specific to the 2PC and template-fit methods, item 6 is specific to the template-fit method, and items 7-8 are specific to the SP method only. The 2PC and template-fit methods are used to measure harmonics v 2 -v 5 , while the SP method is used to measure harmonics v 2 -v 6 .

60-70%
Figure 2: Two-particle correlations in ∆η-∆φ for 2 GeV < p a,b T < 3 GeV. The correlations are shown for the 0-5% (left), 30-40% (center) and 60-70% (right) centrality intervals. The distributions are truncated to suppress the peak at ∆η = ∆φ = 0 to show the long-range correlation in greater detail. The ∆η axis is also truncated at ∆η = ±4.6 to avoid large statistical fluctuations present at the edge of the ∆η acceptance. Figure 3: Two-particle correlations in ∆φ for 2 < |∆η| < 5 and 2 GeV < p a,b T < 3 GeV. The correlations are shown for the 0-5% (left), 30-40% (center) and 60-70% (right) centrality intervals. The lines represent Fourier fits to the correlation functions (Eq. (2)) that include harmonics up to n = 5. Figure 4: Two-particle correlations in ∆φ for 2 < |∆η| < 5 and 2 GeV < p a,b T < 3 GeV. The correlations are shown for the 0-5% (left), 30-40% (center) and 60-70% (right) centrality intervals. Also shown is a Fourier fit to the correlation (Eq. (2)) that includes harmonics n up to 5 (black line). The colored lines show the contribution of the v 2,2 (red), v 3,3 (blue), v 4,4 (magenta) and v 5,5 (orange), and can also be identified by the number of peaks in ∆φ. The dotted line indicates the v 1,1 . Each panel represents a different centrality interval. Figure 4 shows the same correlation functions but with a reduced y-axis range to make it easier to observe the features of the correlation. The Fourier fit is indicated by the thick black line and the contributions of the individual v n,n are also shown. In the most central 0-5% collisions (left panel of Figure 4) the v 2,2 -v 4,4 are of comparable magnitude, but in other centralities, where the average collision geometry is elliptic, the v 2,2 is significantly larger than the other v n,n (n ≥ 3). In central events, the away-side peak (at ∆φ = π) is also much broader due to the comparable magnitudes of the higher-order harmonics and v 2,2 , while in mid-central events (30-40% centrality in Figure 4) the near (∆φ = 0) and away-side peaks are more symmetric as the v 2,2 dominates. In central and mid-central events, the near-side peak is larger than the away-side peak. However, beyond 60% centrality (right panel of Figure 4) the away-side peak becomes larger, indicating the presence of a large negative v 1,1 component. This negative v 1,1 component in the peripheral 2PCs largely arises from dijets and its contribution to the correlation function increases in peripheral events [14,26]. While the near-side jet contribution is rejected by the |∆η| > 2 requirement, the away-side jet is not localized in |∆η| and cannot be rejected entirely. The presence of the away-side jet produces a large negative v 1,1 and also affects the other harmonics. Over this centrality range, the v n,n are strongly biased by dijets, especially at higher p T . The presence of the jets also results in the breakdown of the factorization relation (Eq. (4)). This is discussed in more detail in Section 6.2. Figure 5 shows examples of template fits to the measured 2PCs (Eq. (5)), that includes harmonics n up to 5. The figure shows the template fits without a ZYAM subtraction performed on the pp reference. However, in the plots presented in Figure 5, the template fits look identical with and without ZYAM as explained in the following. The ZYAM method sets the minimum of the C periph (∆φ) term to zero by removing a pedestal equal to C periph (0), since the minimum of C periph (∆φ) occurs at ∆φ = 0. The template fits then transfer this pedestal to the G term of the C ridge (∆φ). However, the components of the template fit are shifted vertically by the pedestal of the other term, i.e., the C ridge (∆φ) is plotted as 'C ridge (∆φ) + FC periph (0)' and the C periph (∆φ) is plotted as 'FC periph (∆φ) + G', where G and FC periph (0) are the pedestals of the two components. This shift in the plotting undoes the shift in the pedestals of the two components. In Figure 5 the fits are shown for a central (0-5%), a mid-central (30-40%) and a peripheral (60-70%) centrality interval. The relative difference between C templ (∆φ) and C ridge (∆φ) is largest for the peripheral centrality interval and decreases for mid-central events. This is indicative of larger contributions from non-flow correlations in peripheral collisions than in mid-central collisions.    6.2 Factorization of u n for 2PC and template-fits The exact values of p T at which the breakdown occurs depend on the centrality. The breakdown in factorization occurs mostly because of non-flow effects such as jets. In mid-central events the v 2 is largest, thus the bias from non-flow effects is relatively weak and the factorization holds until much higher p T values than in other centrality ranges. In peripheral events, with the decreased multiplicity, the bias from non-flow effects becomes larger; this, coupled with the decreased v 2 , results in larger or earlier onset (in p T ) of factorization breakdown. In general, the factorization holds better for v 3,3 and v 4,4 than for v 2,2 . This is because the factorization breakdown occurs mostly because of the back-to-back dijets, not rejected by the ∆η cut, which have a larger impact on the v 2,2 than the n ≥ 3 harmonics [14, 26]. These observations are similar to those made in previous v n measurements in Pb+Pb collisions at √ s NN = 5.02 TeV [6]. However, quantitatively, the breakdown in the Xe+Xe case is somewhat larger than in Pb+Pb collisions at the same centrality, due to smaller multiplicities. On the other hand, for the template-fit measurements the factorization works much better for higher p T and for more-peripheral events, indicating a significant reduction of the bias from dijet correlations.  For all remaining plots involving the 2PC or template-fit measurements, the default p a T interval used for the v n measurement is 0.5 GeV < p a T < 5 GeV. Over this p T interval the factorization holds quite well for the 2PC method for most of the centrality ranges. Figure 7 shows the p T dependence of the v n obtained using the 2PC and SP methods. Each panel shows the measurements for a different centrality interval. In general the 2PC and SP results are quite comparable, although small differences can be seen in the most central events. However, in peripheral events (50-60% and 60-70% centrality classes) and for p T > 4 GeV the v 2 from the 2PC method gives systematically higher values. This increase arises due to bias from dijets, and is further investigated below.   v 5 the results are truncated for centralities more peripheral than the 40-50% interval, beyond which the statistical errors are typically too large to study the p T dependence. For all the methods of measurement, the following trends are observed: the v n values increase at low p T and reach a maximum between 2 GeV and 4 GeV, and then decrease. For nearly all centralities, the v n follow the trend v 2 v 3 > v 4 > v 5 . This hierarchy is violated in the most central 0-5% collisions, where the v 3 and v 4 values at high-enough p T become larger than the v 2 values. These trends are the same as those observed in prior v n measurements in Pb+Pb collisions [6,14]. For the 2PC measurements, it is observed that at high p T (the last two p T intervals shown), the v 2 again increases with p T . This feature can be explained as a bias from dijets, which dominate the 2PC at high p T especially in peripheral events. The back-to-back nature of the dijet correlations enhances (suppresses) the measured values of the v n for even-order (odd-order) harmonics. This trend is to some extent also seen in the SP measurements shown previously in Figure 7. This bias is considerably reduced in the measurements based on the template-fit method. It is interesting to note that the template-fit measurements with or without ZYAM give similar results for the v n . As mentioned earlier, the differences between the v n values obtained with/without the ZYAM procedure can give an estimate of possible biases in the non-flow subtraction. Thus, similar values obtained from the two template-fit methods, which make different assumptions about the C periph (∆φ) (Eq. (5)), give more confidence in the robustness of the non-flow subtraction. Figure 9 shows the centrality dependence of the v n obtained from the 2PC and template-fit methods. The different panels represent different p T intervals. Results from the template-fit method with/without a ZYAM procedure on the reference and from the 2PC method are shown. For the 2PC measurements and for p T < 3 GeV -shown in the first three panels -the v n values increase from central to mid-central events, reach a maximum between 40% and 60% centrality, and then decrease. The variation with centrality is largest for the v 2 owing to the large change in the second-order eccentricity [56] from central to mid-central collisions. These features are consistent with those observed in other heavy-ion collisions [1][2][3][4][5][6][7][8]14]. The effects of the non-flow bias seen in Figure 8 can also be observed more clearly in Figure 9, where for the 4-6 GeV p T interval the 2PC v 2 values increase over the 60-80% centrality range. The template-fit v 2 on the other hand does not show this increase. In general, for centrality intervals more peripheral than 60% and p T > 3 GeV, the template-fit v n have smaller values for the even-order harmonics and larger values for the odd-order harmonics, compared to the 2PC results. This is consistent with the removal of bias from dijets, which enhances (suppresses) even-order (odd-order) harmonics. 6.5 Scaling behavior of u n

Scaling of u n (p T ) across different centralities
In a recent ATLAS paper [6] on v n measurements in Pb+Pb collisions, it was observed that the v n (p T ) at different centralities in Pb+Pb collisions had a very similar p T dependence. In fact, after scaling the p T and v n axes by multiplicative factors that depend only on centrality, the v n (p T ) across different centralities were consistent within ∼5% over the 0.5-5 GeV p T range. In this section, a similar study is presented for the v n measured in Xe+Xe collisions.  The left panels of Figure 10 show the v n (p T ) for several centrality intervals and for n = 2 and 3 obtained via the 2PC method. The v n (p T ) are then scaled along the x-and y-axes to match their shapes across the different centrality intervals. For this matching, the 0-60% centrality interval is chosen as the reference and the v n (p T ) for the individual centralities are scaled to best match the v n (p T ) in the 0-60% centrality interval over the 0.5-5 GeVp T range. The fitting is done over the 0.5-5 GeV p T range, with the χ 2 minimization taking into account the combined statistical and systematic uncertainties, and treating the scale factors along the p T and v n axes as the fit parameters. The right panels of Figure 10 show the corresponding scaled v n . The scaled v n match within ±4% across most of the shown p T range. Figure 11 shows the p T and v n scale factors obtained for v 2 and v 3 as a function of centrality. The p T scale factors increase from central to peripheral events over the 10-60% centrality range. Over this interval the values of the scale factors for the two harmonics match within ±8%. However, in the most central events (0-10%) some significant deviation is observed between the two scale factors, where the scale factor for the v 2 increases for more-central events, while that for the v 3 decreases. This deviation may be due to larger jet-bias effects in the v 2 as compared to v 3 , but requires additional investigation to properly understand its origin. The v n scale factors are quite different between the two harmonics, and show a much larger variation for v 2 compared to v 3 . This scale factor is mostly affected by the changing collision geometry; the ellipticity of the collision geometry changes considerably with the collision centrality leading to significantly larger variations in the scale factor for v 2 compared to that for v 3 . Finally, Figure 12 compares the p T scale factors obtained in Xe+Xe collisions with those obtained in Pb+Pb collisions from Ref. [6], as a function of centrality. The scale factors follow a very similar trend as a function of centrality. In fact, for v 3 the values of the p T scale factor are typically consistent within ∼2%. The comparison of this scaling and the corresponding scale factors for Xe+Xe and Pb+Pb collisions should be able to provide additional constraints on theoretical models.

Scaling of u n (p T ) with the harmonic order n
Another scaling previously observed in the v n measurements in Au+Au and Pb+Pb collisions is that the v n as a function of p T qualitatively follow the power-law relation v n /(v m ) n/m = constant [14, [57][58][59]. This scaling is demonstrated in Figure 13 for the v n in Xe+Xe collisions, for n = 3-6 and m = 2, using the v n values obtained with the SP method. It is observed that at least in the 0.6-5 GeV p T range the ratios do not depend on the transverse momentum. The scaling often holds to considerably higher p T values, depending on the harmonic order and the centrality interval. The mean values of v n /v n/2 2 are large in the most central events and become much smaller, even less than one, in the peripheral events. There is also a clear ordering of the ratios over the 0-30% centrality range, with v 6 /v 6/2 2 the largest and v 3 /v 3/2 2 the smallest. The lack of p T dependence of harmonics scaled by v 2 suggests testing for scaling with v 3 . In Figure 14 Figure 15, where the mean values of v n /v n/2 2 and v n /v n/3 3 are shown as a function of centrality. The harmonics scaled by elliptic flow have a considerable variation when going from central to peripheral events. This is because the elliptic flow has considerable centrality dependence, while the other harmonics vary slowly with centrality. Thus the other harmonics scaled by the elliptic flow also demonstrate considerable centrality dependence. In contrast, scaling by v 3 gives a v n /v n/3 3 value of about 1.4 for all centralities and harmonics of all orders larger than n = 3. On the basis of the latter scaling, after measuring v 3 it is possible to also predict v n for n = 4 up to at least n = 6. As mentioned in Ref. [59], the origin of this scaling dependence is not well understood. Further insight from theoretical calculations into its origin would be useful.    [24]. Figure 17 shows similar comparisons for the SP method. In the 0-5% central events, the Xe+Xe values for v 2 are ∼35% larger than the Pb+Pb values. This is seen most clearly from the ratio plots shown in Figure 16. For less central collisions, the ratio decreases and becomes smaller than one by the 10-15% centrality interval. For more-peripheral events, the ratio keeps decreasing, but at a lower rate, and becomes roughly 0.9 by the 50-60% centrality interval. For v 3 the Xe+Xe values are larger than the Pb+Pb values over the 0-30% centrality interval, become comparable to the Pb+Pb values over the 30-40% centrality interval, and become smaller than the Pb+Pb values for more-peripheral events. In this case the ratio in the 0-5% most central events is smaller than for v 2 , and decreases almost linearly over the 0-70% centrality range. For v 4 the Xe+Xe values are only marginally larger in the top 0-5% central events. The ratio for v 4 becomes comparable to, or less than one by the 5-10% centrality and continues to decrease for more-peripheral events. For v 5 the Xe+Xe values are smaller throughout. For v 2 -v 4 , the measured ratios are quite consistent with the theoretical predictions from Ref. [24], and consistent with measurements from the ALICE and CMS Collaborations [9,10]. The observed trends can be explained as follows. Since Xe+Xe is a smaller system than Pb+Pb, the impact of fluctuations is considerably more important. The fluctuations increase the initial eccentricities of the collision geometry and this effect contributes to enhancing the v n in Xe+Xe compared to Pb+Pb at the same centrality. Additionally the Xe nucleus may have a slight prolate deformation [60], which would lead to larger v 2 in most central Xe+Xe events, compared to Pb+Pb. However, because Xe+Xe is a smaller collision system, the viscous effects, which suppress the v n , are larger and play a bigger role with increasing harmonic order and for less central events. In the most central events, the effect of the increased fluctuations wins for v 2 . But with increasing harmonic order and/or for less central collisions, eventually, the viscous effects lower the v n values in Xe+Xe compared to the v n values in Pb+Pb. Over the centrality and p T range shown in Figure 16, the template-fit measurements give results that are similar to those of the 2PC method, and are not shown separately, for brevity. Figures 18 and 19 show the same results but as a function of N part . In general, at the same N part the Xe+Xe and Pb+Pb v n are rather different. One exception is the v 3 , whose values are comparable between Xe+Xe and Pb+Pb for N part < 200. The difference for the v 2 arises because of the very different shapes of the average collision geometry between Xe+Xe and Pb+Pb at the same N part . At the same N part , the Pb+Pb events correspond to collisions with significantly larger elliptic deformation. On the other hand, the v 3 is driven purely by event-by-event fluctuations in the initial geometry, and these fluctuations are expected to depend mostly on N part , leading to similar v 3 values at the same N part . The higher-order harmonics for n > 3 are driven by event-by-event fluctuations, as well as nonlinear response to lower-order eccentricities [19,21]. Since the second-order eccentricity for the Xe+Xe and Pb+Pb collisions is considerably different at the same N part , the nonlinear response to the second-order eccentricity, which contributes to the v 4 and v 5 , is different. The v 4 and v 5 are thus not expected to be similar at the same N part . The difference in the v 3 values at the highest Xe+Xe N part values is intriguing and may be related to centrality fluctuations discussed in Refs. [61,62]. Further input from theoretical calculations to understand this feature would be useful. Figure 20 shows a similar comparison but as a function of p T for the 0-5% centrality interval. The plots show that the ratio remains roughly constant over the 0.5-2 GeV p T range and then decreases for   all harmonics. This is indicative of increasing viscous effects and/or breakdown of the hydrodynamic description beyond ∼2 GeV, which lead to the lowering of the ratio. The left panel of Figure 21 shows the comparison of the v n in Xe+Xe and Pb+Pb as a function of the event multiplicity, N rec ch , defined as the number of reconstructed tracks passing the loose track selection requirements listed in Section 3. Results are shown for the template-fit method. The right panel of Figure 21 shows similar results including measurements in p+Pb but over a limited range. For the Pb+Pb and p+Pb template-fit measurements, the C periph (∆φ) (Eq. (5)) is constructed using pp events within the N rec ch < 20 multiplicity range, similar to the Xe+Xe case. The trends follow what was observed in the N part dependence: the v 3 values are quite similar between the two systems at the same multiplicity over a large multiplicity range (up to ∼800), but the values for the other harmonics are different. The p+Pb v n values are smaller that those for Xe+Xe throughout the overlapping N rec ch range. The N rec ch dependence of the v n in p+Pb collisions is weaker than in Xe+Xe and Pb+Pb collisions. It is interesting to note that the differences between the values (as a fraction) are smallest for the v 3 , possibly indicating that at similar multiplicities the geometry fluctuations in p+Pb collisions are similar to those in heavy-ion collisions.

Summary
This paper presents ATLAS measurements of azimuthal anisotropy of charged particles in Xe+Xe collisions at √ s NN = 5.44 TeV produced by the LHC. The measurements are performed using an integrated luminosity of 3 µb −1 . The azimuthal anisotropies, quantified by the flow harmonics v n , are measured using the SP, 2PC and template-fit methods for n = 2-6. The measurements are performed over wide transverse momentum and centrality ranges. All harmonics show a similar p T dependence, first increasing with p T up to a maximum around 3-4 GeV and then decreasing for higher p T . Significant values of the second-order harmonic v 2 persist up to 20 GeV. The elliptic flow signal (v 2 ) is strongly dependent on the event centrality and it is largest in mid-central events (30-50%). The higher-order harmonics show a weaker centrality dependence, which is consistent with an anisotropy associated with fluctuations in the initial geometry.
Measurements of v n in peripheral heavy-ion collisions via the 2PC method are known to be biased by non-flow correlations. In particular, the even-order (odd-order) v n show a rapid increase (decrease) in value in going from mid-central to peripheral events. This bias increases with increasing p T . For peripheral collisions (60-80% centrality), the v n,n obtained by the template-fit method are shown to be less biased by dijets and obey factorization better than the 2PC-v n,n . The template-fit v n are found to be significantly different from the 2PC-v n at high p T and in peripheral events. However, for events over the 0-60% centrality range and p T 5 GeV the template fits and the 2PC measurements typically yield consistent results for the v n .
Two types of scaling behavior observed in prior v n measurements in Pb+Pb collisions are observed to hold in the Xe+Xe collisions as well. The first scaling is that the v n for fixed n have the same shape as a function of p T across different centralities, up to an overall normalization and scaling along the p T axis. The second scaling is that the ratio v n /v n/m m for two harmonics of order m and n is found to be independent of p T in a given centrality interval. Neither scaling is understood quantitatively.
The measurements are also compared with prior v n measurements in Pb+Pb and p+Pb collisions. Compared to Pb+Pb collisions the Xe+Xe v 2 is larger in the most central collisions. This can be attributed to larger fluctuations in Xe+Xe and a possible deformation in the Xe nucleus, which leads to larger eccentricity in Xe+Xe than in Pb+Pb. For mid-central and peripheral events, the v 2 in Xe+Xe becomes smaller than the v 2 in Pb+Pb, indicating the increased role of viscous effects. This trend with centrality is also observed for the higher-order harmonics. The ratio of the Xe+Xe v n to the Pb+Pb v n , as a function of centrality, is found to be qualitatively consistent with theoretical predictions. As a function of N part (or multiplicity), the v 2 , v 4 and v 5 in Xe+Xe and Pb+Pb are observed to be very different except at small N part (multiplicity). On the other hand, the v 3 values in the two systems are consistent as a function of N part except for N part > 200. The Xe+Xe v n measurements together with the cross-system comparisons offer a unique opportunity to study the interplay of fluctuations in the collision geometry and the role of viscous effects in heavy-ion collisions, and should help to significantly improve the current understanding of the dynamics of heavy-ion collisions. STFC, United Kingdom; DOE and NSF, United States of America. In addition, individual groups and members have received support from BCKDF, CANARIE, CRC and Compute Canada, Canada; COST, ERC, ERDF, Horizon 2020, and Marie Skłodowska-Curie Actions, European Union; Investissements d' Avenir Labex and Idex, ANR, France; DFG and AvH Foundation, Germany; Herakleitos, Thales and Aristeia programmes co-financed by EU-ESF and the Greek NSRF, Greece; BSF-NSF and GIF, Israel; CERCA Programme Generalitat de Catalunya, Spain; The Royal Society and Leverhulme Trust, United Kingdom.
The crucial computing support from all WLCG partners is acknowledged gratefully, in particular from CERN, the ATLAS Tier-1 facilities at TRIUMF (Canada) [12] S. Voloshin  The ATLAS Collaboration