Scale and Scheme Variations in Unitarized NLO Merging

Precision background predictions with well-defined uncertainty estimates are important for interpreting collider-physics measurements and for planning future high-energy collider experiments. It is especially important to estimate the perturbative uncertainties in predictions of inclusive measurements of jet observables, that are designed to be largely insensitive to non-perturbative effects such as the structure of beam-remnants, multi-parton scattering or hadronization. In this study, we discuss possible pit-falls in defining the perturbative uncertainty of unitarized next-to-leading order multi-jet merged predictions, using the PYTHIA event generator as our vehicle. For this purpose, we consider different choices of unitarized NLO merging schemes as well as consistent variations of renormalization scales in different parts of the calculation. Such a combined discussion allows to rank the contribution of scale variations to the error budget in comparison to other contributions due to algorithmic choices that are often assumed fixed. The scale uncertainty bands of different merging schemes largely overlap, but differences between the"central"predictions in different schemes can remain comparable to scale uncertainties even for very well-separated jets, or be larger than scale uncertainties in transition regions between calculations of different jet multiplicity. The availability of these variations within PYTHIA will enable more systematic studies of perturbative uncertainties in precision background calculations in the future.


I. INTRODUCTION
Precision predictions of the final states of high-energy scattering signal or background processes are crucial for the continued success of high-energy collider physics. This includes e.g. exploiting the potential of indirect searches for physics beyond the Standard Model (SM) at the Large Hadron Collider, or precision SM measurements at future lepton colliders. The more detailed signal and background final states can be predicted, the larger the set of conceivable measurements. General-Purpose Event generators (GPEGs) [1] produce a detailed description of final states at the level of individual particles, and thus provide controlled pseudo-data (in the form of simulated scattering events) that can be used to develop new analysis strategies. At the same time, GPEGs aim to predict moderately exclusive final states with as high precision as possible, such that precision SM predictions can be juxtaposed with data to allow setting exclusion limits on parameters in Beyond-the-SM theories.
The precision of GPEG simulations is typically difficult to quantify, since the calculations are based on a mix of perturbative calculations (to determine the distribution of the highest-energy transfer scattering, and its radiative cascade) and phenomenological models, which are necessary to describe the scattering final state in detail (e.g. to incorporate the remnants of colliding beams after scattering, and to ensure conservation of momentum, electric, and QCD color charges). Some measurements are constructed to be as insensitive as possible to phenomenological beam-remnant, multi-particle scattering or hadronization models, such that the uncertainties due to perturbative approximations dominate the overall error budget. Measurements in this category are very inclusive measurements of inelastic scattering processes that do not, at lowest order, include QCD couplings, or moderately inclusive measurements constructed with the help of infrared safe observables.
The goal of this work is to discuss, define and assess the contribution of uncertainties due to perturbative approximations to the error budget of predictions to multi-jet final states at colliders, using precise next-to-leading (NLO) multi-jet merged predictions within the Pythia event generator [2,3] as a case study. Similar case studies within individual leading-order or NLO matched predictions have been considered in [4,5], while [6] focussed on the technical validation of variations within a specific NLO merging scheme. We extend these discussions by considering the combination of several NLO calculations (a particular scheme of combining NLO calculations will define what we call an NLO merging scheme), and the uncertainty due to choices in the combination procedure, i.e. the merging scheme. In particular, we focus on the interplay of uncertainties from defining exclusive cross sections at higher orders, and scale variations within unitarized merging schemes, thus addressing the questions: What is the impact of choices that are not constrained by requirements from retaining shower-and/or fixed-order accuracy on the prediction and uncertainty of the overall calculation? Do some merging schemes exhibit spuriously sized scale uncertainties? O n = (inclusive rate for n partons)[Φ n ] − (inclusive rate for n + 1 partons) [Φ n+1 ] O n [Φ n ] + (inclusive rate for n + 1 partons) [ where O n denotes a fully differential measurement of all momenta in the state Φ n ]. In reality, this complete removal is only achieved for very specific observables (e.g. jet observables defined by using the merging scale definition as separation criterion, and the inverse of the parton shower kinematics as recombination scheme), while residual highermultiplicty sensitivity remains in observables very different to the fixed-order regularization cut. Nevertheless, the same cancellation should in principle also apply to the uncertainty on the high-parton-multiplicity configurations. The latter requires very careful handling of all components of the calculation.
A sensible inclusive prediction should also be complemented with an accurate description of exclusive cross sections that are sensitive to exactly n (and only n) jets or partons 2 . Unitarization introduces higher-order components that depend on higher-parton-multiplicities into exclusive cross sections, by means of the explicit subtraction, i.e. through the second term in eq. 1. These differ from the naive parton-shower result by subleading terms in the parton-shower evolution variable. At NLO, similar subleading terms can appear by introducing all-order corrections to (hard) virtual diagrams, i.e. through the first term in eq. 1. The interplay of these terms is beyond both NLO fixed-order accuracy as well as shower accuracy. However, it is of the same order as variations that are used to gauge NLO fixed-order uncertainties.
It is prudent to require a merged calculation to recover the fixed-order result as well as the uncertainty of the latter in certain regions of phase-space. However, it is not a priori clear how these regions should be defined, nor that it is obvious that any merging scheme (taken here to be defined by different choices of reweighting the NLO corrections) fulfills this requirement. The aim of the current pilot study is to initiate the discussion of these points. To be able to discuss subtle changes in the NLO merged event generator predictions by using a (large) fixed set of statistically produced events, we focus on "reweightable variations" here, such that the impact of purely statistical fluctuations can be minimized. Possible reweightable perturbative variations are a) Variations of the renormalization scale, correlated between fixed-order and parton-shower components; b) Variations of the (all-order) reweighting of higher-order fixed-order terms.
Many other variations are of course possible in an NLO merged calculation. However, these might not be reweightable 3 and thus require prohibitively large event samples to minimize statistical effects, or do not have a well-defined perturbative expansion 4 . Hence, the current study is limited to a consistent definition and assessment of the variations a) and b) above. Other variations (such as factorization scale changes) will not invalidate the findings below, and instead might serve to put more stringent constraints on the allowed form of NLO merging schemes.

III. THEORY AND IMPLEMENTATION
In this section, we define several variants of unitarized NLO merging strategies that have well-motivated, yet different, higher-order structure. For this, we will start from a very general form of a unitarized merging prescription, of which the UNLOPS prescriptions [10] and [12] are sub-sets. This general starting point allows to define several classes of unitarized NLO merging schemes, and thus suggests a large associated uncertainty. However, most unitarized NLO merging schemes will have a spurious behavior, e.g. if their all-order behavior does not recover the all-order logarithmic structure of QCD. Thus, we will discuss conditions that sensible new unitarized NLO merging schemes need to fulfill, e.g. how to define schemes in which scale uncertainties do not deteriorate the accuracy of the overall prediction. The result of this is that we allow a new source of uncertainty -the scheme uncertainty -and determine sensible scheme variations that may constitute a reasonable assessment of this uncertainty.
To begin, it is useful to examine the construction of exclusive jet rates in the merged calculation, with the exclusive 1-jet rate being a sufficiently complicated example. In a general unitarized calculation, this rate is given by where the factors w X are a-priori weights that have to be chosen to preserve certain accuracy criteria, B i [Φ i ] denote the inclusive tree-level calculation of the i-jet rate, B NLO 1 [Φ 1 ] all NLO corrections (including all virtual and real corrections) to the 1-jet rate, and where we have collected pieces stemming from the unitarization of higher-multiplicity contributions with lowest order α 3 s into the symbol S 3 . We assume that all factors w X have a well-defined expansion This expansion immediately guarantees that the lowest-order terms in the expansion of eq. 2 is correctly given by the tree-level result Similarly, the next term in the expansion is correctly given by the exclusive NLO cross section provided that {w NLO } 1 = {w I } 1 and that the real corrections in B NLO 1 and the unitarization term tms B 2 are mapped in an identical way to the Φ 1 phase space points. Non-unitarized merging schemes arrive at the exclusive NLO cross section in a somewhat different manner [13].
Note that to achieve the correct behavior for any choice of the renormalization scale requires {w NLO } 1 = {w I } 1 for each scale value, and thus implies that the expansion of either weight can be defined with reference to a common scale, and that If this condition is not guaranteed, then the all-order accuracy of the prediction, as defined by reference to the parton shower, is compromised. Changes due to scale variations of weights applied to Born-level contributions enter at the same order (O(α 3 s )) as a non-trivial reweighting of higher-order (virtual) corrections. The interplay between these corrections is the main interest of this article. In order to avoid over-generalizations of conclusions about uncertainties drawn from scale variations, we will analyse the O(α 3 s ) expansion of eq. 2 and set up conditions for the reweighting of higher-order corrections. Any reweighting must not, of course, introduce spurious enhacements at any order, since this would exaggerate the "scheme variation". With this in mind, different reweighting strategies can be used together with scale variations to determine more robust uncertainties.
The O(α 3 s ) expansion of eq. 2 reads [10] where we have explicitly included the relevant S 3 terms and assumed that {w NLO } 1 = {w I } 1 also applies to the reweighting of two-parton corrections.
It is reasonable to expect that eq. 6 reproduces the correct coefficient of the largest contribution for the observable O 1 [Φ 1 ]. If the observable measures the merging scale value of states in Φ 1 , we expect a behavior O 1 ∝ α 3 s ln 6 (t ms ). Since {w LO } 2 B 1 ∝ α 3 s ln 6 (t ms ), this amounts to constraints on and/or cancellations between the different weights in eq. 6. The most straight-forward way to enforce such cancellations is to set w NLO = w I for any scale value (we have already seen that this holds for the lowest and next-to-lowest order), and then constrain the concrete form of w NLO from the remaining terms. If we assume that {w LO } 1 B 1 provides a sensible approximation of the leading parts of B NLO 1 , it is tempting to identify As long as {w} 1 scales as α s ln 2 (t ms ) or less, and while the difference in brackets is subleading, this combination will not lead to undesirable enhancements. Nevertheless, while {w} 1 = 0 and B NLO 1 = {w LO } 1 B 1 , this contribution is a new, process-dependent, source of scale uncertainties beyond the NLO and parton shower approximations. The impact of this term on the overall uncertainty is thus best estimated by considering explicit test cases, as e.g. done in the next section. It could be argued that having w ≡ 1 brings the merging prescription closer to the traditional separation into all-order W -terms and fixed-order Y −terms in analytic resummation [14,15], which typically includes hard virtual corrections into the all-order W -term [16] and assumes that the fixed-order Y -term not only remains constant, but vanishes in the limit when Φ 1 and Φ 2 become indistinguishable [17]. It is however not directly obvious that the calculation in [16] translates to the context of a fully differential event generator employing IR/UV regularization prescriptions different from [16]. An assessment of the numerical effect of different treatments is thus interesting in its own right, also beyond the context of its interplay with renormalization scale variations in NLO merging schemes.
To summarize, the above considerations lead to a simple guideline how higher-order corrections may be reweighted without compromising the quality of the calculation: All terms in the expansion of the leading-order reweighting w LO , and all NLO corrections, should be reweighted by the same (potentially dynamical) weight. Enhancements appearing in the expansion of this weight should be in agreement with standard QCD expectations. Below, we describe different unitarized NLO merging strategies that meet these criteria. We then assess the uncertainties on predictions that result from consistent renormalization scale variations in matrix element generation, merging and parton shower, as well as the "merging scheme" uncertainty. We limit the discussion to predictions that are NLO correct up to the first additional jet with respect to the reference process, and LO correct for the second and third jet. The generalization to higher multiplicity is straight forward, but omitted here in favor of readability.

A. UNLOPS
We start from the UNLOPS multi-leg NLO merging scheme described in detail in [10]. The expectation value of an arbitrary jet observable O in UNLOPS is given by (8) with k = 1. In the above equation, B andB denote fully differential cross sections at leading and next-to-leading order in the strong coupling, in the following also called matrix element samples. These are interfaced using the LHEF format. {X} αs denotes the O(α s ) contribution to the term X. Π n (k) is short for and describes the no-emission probability between the two evolution scales, taking into account all allowed i → j splittings of all legs in an n parton state, as described by the kernels P ji (z). This no-emission probability can be calculated numerically by trial parton showering [18]. In Pythia 8 [3], where the evolution of partons by emissions and the simulation of secondary multiparton interactions (MPI) is described by a common, interleaved, evolution sequence, the no-emission probabilities generated by trial showering can also accommodate no-MPI probabilities of the relevant transverse momentum scales [19]. This allows a smooth combination of input matrix element samples with the MPI machinery. In case of hadronic initial states, the necessary PDF ratios are included as additional weights. As argued above, we do not vary the factorization scale, since it is not obvious how to achieve a consistent reweightable variation in this case. The subscript on O n denotes the final state multiplicity of the states handed to subsequent event generation steps. Thus, in some cases, e.g. to create counter-events for unitarization, the state used to evaluate the matrix element calculation is replaced by a lower-multiplicty state determined by reclustering according to a reconstructed parton shower history. Finally, the K-factor is only applied to configurations that could have been produced by parton shower emissions. These regions are defined by being able to construct at least one parton shower history of emissions that are ordered in a decreasing sequence of evolution scales. This choice is consistent with the MC@NLO matching scheme, where hard real-emission configurations not reachable by showering are described with tree-level matrix elements. Note that in this UNLOPS scheme, we set w S = w N LO = w I ≡ 1, i.e. treat all higher-order fixed-order corrections as "hard corrections" that do not contribute to the all-order result. This relatively conservative approach has the advantage that scale variations due to "hard" virtual corrections do not introduce all-order uncertainties. It has the disadvantage that any "soft" virtual correction terms in B N LO 1 that are not correctly reproduced by the parton shower are not leveraged to define a more realistic all-order uncertainty.
We may estimate the theoretical uncertainties of predictions obtained by the UNLOPS NLO merging procedure by variations of the renormalization scale µ R . For consistency, these variations should not be limited to the seed cross sections, but should also include renormalization scale variations of the parton shower. For this reason, we employ the scale variations that have been implemented in the Pythia 8 parton shower [20]. We extend this procedure to ensure consistent simultaneous variations in the calculation of merging weights, and a consistent set-up between matrix-element and parton-shower contributions. Every renormalization scale in the matrix element samples B n ,B n is varied, as is every explicit occurrence of µ R in the above formula. Furthermore, the same variation is applied to each argument of p ⊥,n in the strong coupling α s in the parton shower. The variation can be produced by reweighting using the variation factor k in eq. (8) Note that in this particular NLO merging prescription, the NLO correctionsB n are not reweighted. Below, we will consider variants in which NLO events are weighted in different a manner.

B. UNLOPS-P
Alternative unitarized merging schemes that remain NLO correct and do not degrade the all-order behavior can be obtained by suitably changing how all-order weights are applied to higher-order fixed-order contributions. In eqs. 6-7, we have argued that choosing a common reweighting for the NLO corrections B N LO 1 and the O(α s ) expansion of the parton shower is one (simple) way to comply with all accuracy constraints.
As first alternative to UNLOPS, we will define the UNLOPS-P scheme (where the "P" is intended to signify the extended use of no-emission probabilities). This alternative unitary merging scheme is inspired by the treatment of higher-multiplicity NLO corrections in the UN 2 LOPS NNLO matching prescription [21], and amounts to applying a Sudakov weight factor (consisting of PDF ratios and no-emission probabilities) to the higher order terms. In [21], it was argued that reweighting the remnant bracket in eq. 7 with a Sudakov factor can be interpreted as dressing an IRsubtracted hard state with the effects of soft and collinear radiation. In UNLOPS, the IR-subtracted NLO correction is instead not dressed with higher-order effects. The use of Sudakov factors could be regarded more physical. It has the added benefit that the impact NLO corrections to 1-jet states in soft/collinear regions is reduced, thus leading to a gain in numerical stability for small merging scale values. In the UNLOPS-PC scheme below, we will reassess and refine the interpretation as "dressing with the effects of radiation".
The expectation value of an arbitrary jet observable O in UNLOPS-P is given by The double logarithmic Sudakov factor is dominant in the soft/collinear region, suppressing the rise of the O(α s ) corrections. These features should be noticeable both in 1-jet inclusive observables, but also, by virtue of unitarization, in exclusive 0-jet observables.

C. UNLOPS-PC
Another alternative to UNLOPS is the UNLOPS-PC scheme defined in the following (where the "C" is intended to signify the extended use of running-coupling factors). This scheme is motivated by clarifying the argument of [21]: that reweighting the remnant bracket in eq. 7 can be interpreted as dressing a IR-subtracted hard state with the effects of soft and collinear radiation. In UN 2 LOPS and UNLOPS-P, it was assumed that the latter effects can be approximated through the application of Sudakov factors. Sudakov factors primarily encapsulate the dressing of parton propagators with self-energy corrections. However, a systematic treatment of leading-logarithmic dressing also includes the effect of vertex corrections, and of running-coupling effects [22] to obtain an approximation of the correct ladder diagrams. It thus stands to reason that a more physical notion of "dressing with the effects of radiation" should include both Sudakov-and running-coupling reweighting. This constitutes the UNLOPS-PC scheme, in which the expectation value of an arbitrary jet observable O is given by This scheme treats leading order and next-to-leading order contributions on equal footing. The inclusion of no-emission probabilities regulates the contribution of radiative events in soft/collinear regions of phase space. The inclusion of the strong coupling ratio produces an opposing effect, increasing the impact of NLO corrections at lower splitting scales. Due to the exponential form of the no-emission probability, the Sudakov suppression will naturally overcome the coupling ratio effect at lower scales. Nevertheless, away from the collinear limit, the single logarithmic evolution of the strong coupling ratio is not negligible compared to the Sudakov double logarithm.

D. Comparison of +1j contributions
Before continuing, it is useful to reiterate the differences between the UNLOPS variants, in order to gain some intuition about the impact on phenomenology. The differences mostly pertain to the treatment of the +1 jet contribution. For clarity, we split the Born+virtual+real contributionB i into its LO component B i and a pure NLO correction B NLO i , and label the original UNLOPS prescription as UNLOPS-1, since a unit weight is applied to NLO contributions. With the notation the +1 jet components of the merging schemes read UNLOPS-1 UNLOPS-P UNLOPS-PC Thus, the main difference between the variants lies in the factor multiplying the term in square brackets. As argued above, it is important to apply weights to this combined term, since, if a logarithmically enhanced weight is applied to only the NLO term, or only the product of Born-term and the first-order expanded weight, then a leading-logarithmic term will be introduced on higher orders in α s , thus spoiling the LL accuracy of the merging prescription.
For well-separated hard emissions, the UNLOPS-1 and UNLOPS-P schemes should agree, since the Sudakov factor is approaching unity. This does not necessarily extend to the UNLOPS-PC scheme, for which α s (p ⊥,1 )/α s (µ R ) = 1 is possible if p ⊥ → µ R for increasingly hard emissions. This is e.g. the case in e + e − → jets processes in Pythia, where the emission p ⊥ in final state radiation is bounded by p ⊥ < m e + e − /2, and µ R is typically set to m e + e − , such that the ratio is strictly larger than one. The ratio may also be smaller than one if parton shower emissions or reconstructed histories are possible at higher p ⊥ values than the µ R , as is e.g. the case in Drell-Yan events with a jet p ⊥ > µ R = M Z .
The term in brackets can, depending on kinematics, have either sign. Thus, it is not immediately obvious if changing from one UNLOPS variant to another will uniquely lead to either enhancement or depletion. However, we expect the UNLOPS-P prediction to be closest to a leading-order (UMEPS) result, since the Sudakov weight is smaller than unity. Due to the application of the Sudakov factor in UNLOPS-P, the fraction of negative cross section in the event generation is expected to be reduced. In UNLOPS-PC, the Sudakov factor and the coupling ratio weight have opposite effects on the fraction of negative cross section, so that the net effect is not obvious. For the specific example of e + e − → jets at the center of mass energy √ s = M Z , with up to one additional jet at NLO, and a merging cut at min ij p Lund ⊥,ij = 5 GeV, we find that the fractions of negative contributions to the cross section are given by Thus, the amount of negative contributions differs only very mildly between the schemes.

IV. APPLICATION AND RESULTS
This section intends to assess the impact of renormalization scale and merging scheme variations using a small selection of illustrative example observables. We have implemented the different variants of unitary NLO merging in Pythia 8, relying on matrix element input from MadGraph5 aMC@NLO [23]. Furthermore, we implemented the renormalization scale variation, taking into account variations of fixed-order and parton shower origin, as well as in the weights applied in the merging procedure. The implementation will be made available in a future release of Pythia 8.3.
For reasons of consistency between the parton-shower subtraction terms employed in aMC@NLO and the event generator, we use a non-default configuration of the parton shower for the first emission. This includes a global recoil scheme, where the recoil of a final-state emission is shared among all final state partons in the event. Furthermore, matrix element corrections to the parton shower are removed, and we do not allow for an α s running in the first parton shower emission, since we use fixed renormalization scale choices in aMC@NLO when generating matrix elements. We terminate the evolution after the first emission, and store the resulting events as Les-Houches event files. Using these settings ensures that the parton shower contribution of the first emission on Born configurations correctly cancels the parton-shower subtraction terms used in the generation of matrix elements, leading to a consistent NLO event sample. This event sample is then used as input for subsequent NLO merging, which proceeds using a default Pythia shower setup. For consistency with the scale variations performed in the matrix element generation, which are based on the α s running provided by the employed PDF packages, we use a second order running coupling with reference value of α s (M Z ) = 0.118. This allows us to consistently vary renormalization scales within matrix element generation, merging weight evaluation and parton shower emissions.
To generate events, we employ the minimal value of all possible shower splitting scales p Lund ⊥,ij as a merging scale 5 to regularize fixed-order inputs and act as separator between the hard emission phase space described by the fixed order matrix element, and the soft region described by the parton shower.
To assess uncertainties, we investigate the effect of variations on the modelling of the e + e − → jets and pp → W +jets processes. For both processes, we include up to one additional jet at NLO and the second and third jet emission at LO fixed order accuracy. The plots in this section are generated using the Rivet toolkit [24]. In order to suppress large fluctuations in the variation bands at low scales, we do not allow for shower variations below three times the shower cut off, and limit the allowed range of variation of the strong coupling by requiring |α s − α s | ≤ 0.75.

A. Jet Production in Electron Positron Collisions
In this section, we highlight renormalization-scale-and merging-scheme uncertainties by referring to electronpositron annihilation into jets. Electron positron collisions are simulated at the center of mass energy √ s = M Z , to be able to compare the different merging prescriptions to LEP data. We use a p Lund ⊥,ij merging scale of value 5 GeV. Figure 1 shows the differential jet separation distributions in the 3 → 2 clustering y 23 and in the 4 → 3 clustering y 34 , with the Durham k ⊥ -jet separation [25]

Comparison of UNLOPS-1, UNLOPS-P and UNLOPS-PC
normalized with the squared CM energy s. The differing weighting prescriptions in the different schemes affect both the central prediction and the renormalization scale variation bands. The y 23 jet separation distribution in fig. 1 shows that the central prediction at NLO accuracy agrees between UNLOPS-1 and UNLOPS-P at high jet separations, since the Sudakov factor is close to unity for in these regions, and no strong coupling ratios are applied in the two schemes. In the UNLOPS-PC prescription, the strong coupling ratio introduces an upward shift, by effectively evaluating the coupling at a lower scale. Going to lower scales, UNLOPS-P falls compared to UNLOPS-1, due to the Sudakov suppression. In UNLOPS-PC, the strong coupling running counteracts this effect, leading to a milder decrease. The unitary property in all schemes ensures that an increase (decrease) at high scales induces a decrease (increase) and lower scales. This leads to the observed central predictions at low separations behaving opposite to the high scale results for every scheme. High y 34 values agree for all schemes up, to statistical fluctuations, since in this region, all schemes recover the result of the UMEPS unitary merging prescription [9]. However, we observe differences at lower scales. This can be explained by the NLO precise lower multiplicity sample (here, the e + e − → 3 jet NLO sample), that is modified by the different weighting prescriptions, and is showered below the merging scale, thus contributing to y 34 at small separations. Overall, the central description of jet separation observables, which are very sensitive to the merging weight prescriptions, differs by up to about 5% between the described schemes.
Scale variation bands for each merging scheme include variations of fixed-order, parton-shower and mergingreweighting origin. As opposed to renormalization scale variations in the matrix elements only, this induces larger uncertainties at small jet separations, where emissions are generated by the parton shower. For observables at LO precision, e.g. y 34 in fig. 1, variations of the scales induce a very large band, amounting to about 20% in each direction for all schemes alike. In unitary merging schemes, the subtraction of the respective jet multiplicity sample from the next-lower jet topology, turn the variation bands around, since they contribute, via showering, at low jet separations. This leads to a region with unphysically small uncertainty bands where the varied distributions cross.
Predominantly NLO precise distributions, such as y 23 in fig. 1, show smaller bands at high jet separations. In this region, renormalization scale uncertainties in the method contribute only at O(α 2 s ) due to NLO-precise inputs, instead of O(α s ) otherwise. At small jet separations, the distribution is again described by parton shower emissions instead, so that a large variation is observed. At the transition, there is an unphysically small variations, as observed for LO observables.
Comparing the size of the variation bands between UNLOPS-1, UNLOPS-P and UNLOPS-PC, we note that UNLOPS-1 and UNLOPS-PC are very similar, while UNLOPS-P leads to larger scale variation bands. This suggests that the application of Sudakov suppressions alone -without taking strong coupling ratios into account -introduces an additional variation, which is partly cancelled by the coupling ratio variation in UNLOPS-PC. The Sudakov variation behaves as 1 − α s (c 1 L 2 + c 2 L + c 3 ) + O(α 2 s ), while the α s ratio behaves as 1 + α s c L + O(α 2 s ), where L denotes a logarithmic enhancement of type ln Q 2 /p 2 ⊥ , Q denotes a characteristic scale of the hard process and p 2 ⊥ the scale of jet separation. At O(α 3 s ) (i.e. induced by O(α s ) terms of the reweighting) we thus observe a partial cancellation in the single logarithmic contribution in UNLOPS-PC, which is not present in UNLOPS-P. We have found changes of similar size when setting w S = 1, w NLO = w LO , i.e. removing the consistency condition in eq. 7 and jeopardizing the logarithmic accuracy. Thus, a very conservative uncertainty estimate that also acknowledges the potential of subleading-logarithmic mismodeling, should likely consist of the combination of the scale uncertainties of all three merging schemes. As observed above, the difference between the weighting prescriptions is rather minor, amounting to no more than about 5% for very sensitive jet separation observables. We furthermore do not observe a large difference in the description of experimentally measured data distributions. In fig. 2, we compare the thrust and Durham jet resolution to ALEPH [26] and OPAL [27] data.

Comparison to LEP Measurements
The Durham jet separation distribution is well described by all schemes, especially at large jet separations. In total, the data is compatible with the prediction of all schemes across most jet separations. Only at very low separations, where the statistical uncertainty on the data is rather large, do the prediction differ mildly from the data distribution. Overall, UNLOPS-PC agrees slightly better with the measured data distribution, compared to the other weighting schemes.  [28] and exclusive jet multiplicity [29].
The thrust distribution ranges between zero for very narrow back-to-back jet configurations, corresponding to very soft "hardest" emissions, and 1/3, corresponding to three very well separated jets. We find larger scale variations -caused by shower emissions -at low 1 − T , while high 1 − T variations are milder. The band is in general wider than in the Durham jet separation distribution, since the thrust distribution is more sensitive to further emissions described only at LO. The agreement with data is satisfactory, with differences only at low 1 − T , where even the "hardest" emission is modeled solely by the parton shower. In both cases, the unphysically narrow variation bands where the variations cross can potentially lead to a significant deviation from measured data. In the observables shown in fig. 2, an envelope of the result of all schemes may be used to mitigate such an effect.

B. W+jets Production in Proton Proton Collisions at LHC
The simulation of final states in hadron collisions is typically much more involved than in lepton collisions, due to the rich structure of the composite colliding particles, as well as the larger phase-space available to the final-state particles. Thus, an assessment of uncertainties due to scale-and scheme variations in hadron colliders is necessary. In this section, we illustrate these uncertainties using W + jets final states at proton-proton collisions at √ s = 7 TeV. Jet observables of this process are then compared to LHC data. We use a p Lund ⊥,ij merging scale definition, with value 10 GeV as merging scale cut. Furthermore, we apply a NLO K-factor K = σ NLO (pp → W )/σ LO (pp → W ) to all leading-order input configurations that could have otherwise been reached by the p ⊥ ordered shower. Non-ordered additional jet configurations are thus interpreted as "genuine" real-emission corrections, for which a naive rate correction can be considered questionable. All results are produced using the NNPDF3.1 nlo as 0118 PDF set [30] via the LHAPDF framework [31].
Proton-proton collisions introduce (at least) two more sources of renormalization scale uncertainty: The treatment of running couplings in initial-state shower evolution, and in multiparton interactions. Since the latter are highly correlated with other semi-or non-perturbative parameters, we do not consider their impact in this study. Renormalization-scale and merging-scheme variations for hadronic initial states thus require only simple generalizations on top of the previous section.  [29] and azimuthal distance ∆φ between hardest jet and muon [32].
Since the term in parentheses in eqs. 14−16 acquires PDF-dependent components, it is not obvious that the level of similarity found in e + e − collisions is also present at hadron colliders. Figure 3 shows the weighting schemes compared to pp → W + jets data at √ s = 7 TeV for the k ⊥ 0 → 1 clustering scale [28] and the exclusive jet multiplicity [29]. At high √ d 0 , no difference between the schemes is found. This is due to the reference renormalization scale µ R = M W chosen in the generation of matrix elements being reachable by parton showering. Thus, the differences between UNLOPS-P and UNLOPS-PC are less pronounced than in e + e − collisions. We observe a very light suppression of UNLOPS-PC at high √ d 0 , compared to UNLOPS-P. The strong coupling ratio enhances UNLOPS-PC compared to UNLOPS-P below M W . At very low scales, the distribution is again dominated by parton shower emissions from 0-parton states, as well as shower emissions from the integrated subtraction of the NLO 1-jet sample. The overall strong coupling ratio enhancement of the subtraction in UNLOPS-PC is consistent with the lower UNLOPS-PC result observed at low scales. Note that with the chosen PDF set, all schemes struggle to describe the low-p ⊥ region satisfactorily. This suggests that a retuning of the MPI model might be necessary when using this setup productively.
The first jet clustering scale √ d 0 is dominated by NLO-precise contributions at high values, and thus has a very small scale variation band in that region. However, this seems to also be the case at small separation, where the parton shower, as well as MPI effects, dominate. The reason for these milder variations lies in the implementation of the shower scale variations in Pythia 8 [20]: In order to avoid numerical instabilities in the reweighting procedure when approaching low scales, no shower variations are performed below a certain scale (determined by multiplication of a factor and the shower cut-off scale). In ISR, this is applied to the regularization parameter pT0Ref with default value of 2 GeV, while in FSR, it applies to the pT0 parameter of default value 0.5 GeV. Thus, shower variations in ISR are suppressed below transverse momentum scales of about 6 GeV. These values were chosen to limit the size of the weight fluctuations induced by the shower reweighting procedure of [20], but lead to the merging prescriptions not agreeing within their bands at low scales. Using different merging schemes thus helps to isolate phase-space regions with questionable uncertainty estimates, and a combination of scheme variations is advisable.
The right plot in fig. 3 shows the exclusive jet multiplicity of jets with p ⊥ > 30 GeV [29]. The 0-jet and 1-jet bins, which are described with NLO precision, are reproduced very well. Higher multiplicities, described at LO or parton-shower accuracy only, are underestimated. While the differences between UNLOPS-P and UNLOPS-PC are negligible in this observable, both schemes yield a very slightly larger exclusive 1-jet rate than UNLOPS-1. If the contribution of the term in square brackets in eqs. (14) to (16) was mostly positive in the relevant region of phase space, the Sudakov factor should rather lead to a lower prediction for UNLOPS-PC and UNLOPS-P. That this is not the case suggest that the contribution is negative at least in some parts of the phase space, highlighting that there is a non-trivial interplay between the different contributions, and that rule-of-thumb reasoning should be considered with caution.
The left two plots in fig. 4 show the scalar sum of jet transverse momenta H T for inclusive and exclusive 2-jet events. In particular distribution of H T in exclusive 2-jet events shows a strong overshooting of the prediction, compared to the data. This is due to a mismodeling of the prediction for the transverse momenta of the first-and second-hardest jets. In inclusive 2 jet events, this effect is milder due to an underestimate of subleading jet transverse momenta, which conspires with the former effect to yield a less pronounced effect. Appropriate scale choices for unordered jet event topologies [33] or the inclusion of electro-weak histories [34,35] have been inferred to improve this situation 6 . The new NLO merging prescriptions proposed in this study do not improve this mismodeling of data. Finally, the right plot in fig. 4 shows the azimuthal distance between the hardest jet and the muon in leptonic W signatures. No significant differences between the schemes is observed, suggesting that the correlation between QCD-and electroweak parts of the events are insensitive to the scheme variation.
The observables discussed here serve as an illustrative example of the effects of scale and scheme variations in unitary NLO merging. Similar effects can be seen in other observables. More observables, processes and energies can be studied with the implementation made available in a future release of Pythia 8.3.

V. SUMMARY AND OUTLOOK
Event generator uncertainties are one of the main obstacles to precision measurements in collider experiments. This is particularly obvious when event generators are used as an accurate model of large backgrounds to high-energy signal processes with low rate appearing in the tails of Standard-Model-dominated observables. To describe such backgrounds, precise NLO merged event generator calculations are needed. Variations of perturbative parameters in such calculations should give a good indication of the overall precision of the predictions. However, since NLO merged calculations include fixed-order as well as all-order effects, and combine multiple calculations in an intricate manner, it is not completely obvious how a realistic perturbative uncertainty should be assessed. In this article, we have presented steps towards this goal, and in particular focused on the interplay between renormalization scale choices and the very definition of the merging scheme at higher orders. These enter at the same order, such that it is important to quantify their individual impact, as well as their correlation. For this purpose, we extended the unitarized NLO merging prescription in Pythia 8 to accommodate renormalization scale variations (as a combined framework encompassing correlated variations of fixed-order, parton-shower and merging components) and merging scheme changes. For the latter, we have introduced two extensions of the UNLOPS method, which were motivated by different interpretations of dressing process-dependent NLO corrections with the all-order effects of soft and collinear radiation. The implementation will be publicly available in a future release of the Pythia 8.3 event generator.
The renormalization-scale and merging-scheme variations were used to estimate the perturbative uncertainty of illustrative observables in electron-positron and proton-proton collisions. Overall, the estimate is as expected: The uncertainty is small in regions that are primarily sensitive to the NLO fixed-order components of the of the calculation, and large in regions dominated by parton showering. The renormalization scale uncertainty bands of different merging schemes largely overlap. In transition regions between calculations of different jet multiplicity, the difference between (the central result of) the schemes can be larger than scale variations, due to the latter being artificially small due to unitarity requirements. Some visible differences between the merging schemes, in size similar to scale uncertainties, can also remain in regions sensitive to very well-separated jets. This is mainly related to a different definition of the functional form of the argument of the running coupling at higher orders, which persists even in those phase space regions. Thus, a joint scale-and scheme-variation may be considered a more reliable uncertainty estimate.
The current study should be regarded as an initial step in the assessment of uncertainties in unitarized NLO merging schemes. We have focused on a subset of variations for which a minimization of contamination by statistical fluctuations in the event generation is possible through a reweighting procedure. It would be very valuable to extend this property also to other sources of uncertainty, such as consistent combined factorization-scale and shower startingscale variations, changes in cut-off of parton-shower evolution, and changes in the merging scale definition and value. These would require an extensive redesign of the parton-shower algorithm. More insight into the effect of using different PDF parametrizations would also be valuable, but is complicated by the strong correlation with other semi-or non-perturbative components of the event generator -which is commonly held fixed after event generator tuning. Nevertheless, we believe that making the developments presented in the current study available within the Pythia 8 event generator will already allow also non-developers to perform more systematic studies of Standard-Model background uncertainties.