A Neural Resampler for Monte Carlo Reweighting with Preserved Uncertainties

Monte Carlo event generators are an essential tool for data analysis in collider physics. To include subleading quantum corrections, these generators often need to produce negative weight events, which leads to statistical dilution of the datasets and downstream computational costs for detector simulation. Recently, the authors of 2005.09375 proposed a positive resampler method to rebalance weights within histogram bins to remove negative weight events. Building on this work, we introduce neural resampling: an unbinned approach to Monte Carlo reweighting based on neural networks that scales well to high-dimensional and variable-dimensional phase space. We pay particular attention to preserving the statistical properties of the event sample, such that neural resampling not only maintains the mean value of any observable but also its Monte Carlo uncertainty. To illustrate our neural resampling approach, we present a case study from the Large Hadron Collider of top quark pair production at next-to-leading order matched to a parton shower.


I. INTRODUCTION
Data analysis in collider physics relies heavily on simulations to achieve precision.
The state-of-theart setup includes general purpose event generators like Pythia [2], Herwig [3], and Sherpa [4], interfaced with next-to-leading order (NLO) corrections in the strong and/or electroweak couplings. These higher-order corrections are based on methods such as * Electronic address: bpnachman@lbl.gov † Electronic address: jthaler@mit.edu MC@NLO [5] and Powheg [6] built into the general purpose generators or available as standalone packages like MG5 aMC@NLO [7] and Powheg-Box [8]. Outputs from these Monte Carlo generators are then fed into sophisticated detector simulation frameworks, such as those based on Geant4 [9]. Event generation and simulation is becoming an increasing relevant computational bottleneck for collider data analysis [10,11], particularly for the upcoming High Luminosity Large Hadron Collider (HL-LHC). Part of the computational burden of Monte Carlo event production is related to the appearance of negative weight events. Each collision event produced by the pipeline above has an associated event weight w i . The weights represent the fact that a single simulated collision does not necessarily correspond to one real event.
With the introduction of NLO corrections, some of these weights can even be negative. As discussed more below, these negative weights complicate simulation-based inference and increase the computational demands.
In this paper, we present a neural resampler for Monte Carlo reweighting, which removes negative weights while preserving the statistical properties of the event sample. Our method builds upon the positive resampler approach introduced in Ref. [1], which uses histogram bins to determine quasi-local reweighting factors. The new feature of neural resampling is the use of neural networks to determine the reweighting factors, which allows us to work with the full unbinned, high-dimensional (and variable-dimensional) phase space. Another difference with Ref. [1] is that we pay particular attention to preserving the statistical uncertainties of the event sample, which we accomplishing by pairing a local reweighting factor with a local resampling rate.
Let us review why weighted Monte Carlo events pose a computational burden for event generation and simulation. Here, we use "generation" to refer to particlelevel event generation (e.g. Pythia) and "simulation" to refer to detector simulation (e.g. Geant4). When a particle-level phase space point has a non-trivial spectrum of weights (e.g. positive and negative), this implies that a large number of generated events will be needed to achieve statistical accuracy. The reason is that only the average weight at a given phase space point is physically relevant, so if the local phase space weights have a large variance, then many generated events will be needed to accurately estimate the mean. In some cases, it might be possible to reduce the occurrence of negative weight events directly in the generation step [12], though this depends on the specific generator implementation. Barring that, a large number of particle-level events will need to be passed to the computationally expensive detector simulation step to achieve the desired accuracy.
The idea behind positive resampling [1] is to remove negative weights using a quasi-local weight rebalancing scheme. This method does not depend on how the particle-level events are generated and it is relevant anytime there is a non-trivial spectrum of weights (i.e. even if all the weights are positive but different). Though this method requires choosing a set of observables for histogram binning, the performance is very good when using a relatively small number of physically-motivated observables. Note that positive resampling does not reduce the number of particle-level events that must be generated in order to achieve the desired statistical accuracy. Still, by rebalancing the event weights, positive resampling does decrease the number of particle-level events that are subsequently fed through the detector simulation. This can significantly reduce the computational cost of the full event production pipeline.
With neural resampling, we take advantage of the fact that neural networks, when paired with a suitable training algorithm, are excellent likelihood ratio estimators. This fact has been exploited in a variety of recent studies in high-energy physics [13][14][15][16][17][18][19][20][21][22][23]. The technique presented here is most closely related to the OmniFold unfolding algorithm [20], which is based on the Dctr technique for full phase space reweighting [18]. Essentially, one can think of Monte Carlo reweighting as computing the likelihood ratio between the desired distribution and an unweighted baseline. Neural networks are particularly wellsuited for this task because they can naturally process the high-dimensional phase spaces encountered in Monte Carlo event generation. To handle variable-dimensional phase spaces, we take advantage of the particle flow network (PFN) architecture for point cloud learning [24,25].
After positive resampling or neural resampling, all of the event weights will be non-negative (assuming nonnegative cross section), though they may not be uniform. The positive resampling method has a partial unweighting hyperparameter that determines what fraction of events are given a uniform weight [1]. In general, though, (partial) unweighting does not preserve the statistical properties of the event sample. Specifically, unweighting does preserve the expected mean of any observable, but it does not preserve the sample variance. As reviewed below, sample variance is the most common way to estimate Monte Carlo uncertainties. 1 Crucially, neural resampling preserves the sample variance via a local resampling rate, without needing to choose any hyperparameters. Of course, after neural resampling, one could optionally use standard unweighting techniques to make all of the weights uniform, at the expense of decreasing the effective statistical power of the generated data set.
As an alternative use of neural networks, Monte Carlo unweighting could be achieved using generative models such as generative adversarial networks [26] and variational autoencoders [27,28]. Generative models have been extensively studied to accelerate or augment many aspects of high-energy physics simulations . These approaches, however, require learning the full phase space density, instead of just the likelihood ratio, which is significantly more complicated than the neural resampling approach presented here. It is also worth mentioning that neural networks and other machine learning techniques have been studied to improve other components of event generation, including parton density modeling [62,63], phase space generation [64][65][66][67][68], matrix element calculations [69,70], and more [71][72][73].
The rest of this paper is organized as follows. In Sec. II, we review the statistics of event weights and introduce a method to preserve statistical uncertainties after local phase space reweighting. Then, Sec. III introduces our neural network-based resampling technique and shows how it can remove negative weights while preserving uncertainties. We demonstrate the performance of neural resampling for two case studies in Sec. IV: a simple example of two Gaussians and a realistic collider example of matched NLO top quark pair production. The paper ends with brief conclusions and outlook in Sec. V.

A. Review of Weights and Uncertainties
The output of a generic Monte Carlo event generator is a set of N simulated phase space points x i ∈ X along with their associated weights w i ∈ W . One can think of X ∈ R d as representing the event features in a ddimensional phase space. For simplicity, we work with normalized weights for this discussion: such that w i = 1/N for an unweighted event sample. It is straightforward to adapt the equations below to alternative weight conventions, and for the case studies in Sec. IV, we work with samples with initial weights The expectation value of an observable O can be estimated through a weighted sum: where O(x) is the value of the observable at phase space point x. For example, O(x i ) = δ(x i in bin) for the contents of a histogram bin. In this discussion, hats always indicate estimates obtained through finite sampling. The true value O can only be obtained in the N → ∞ limit, so the estimate in Eq. (2) is subject to uncertainties. The variance of Eq. (2) is N times the variance of each independent term in the sum: where W is a random variable of which w i is one realization. In general, though, the true value of Var[W O(X)] is not known a priori. Therefore, we have to estimate the variance through the sample variance: (4) In many cases of interest, the second term is subdominant, such as when O(x i ) is zero for most phase space points and one otherwise. Under that assumption, we obtain the standard Monte Carlo uncertainty estimate: (5)

B. Monte Carlo Reweighting
The idea behind reweighting is to select a subset of phase space points x j with modified weights w j such that Eq. (2) (and later Eq. (5)) are preserved in the large N limit. Consider a small patch of phase space such that O is nearly constant. In such a region with N patch events: The key observation is that we can replace the N patch original events with N patch /K events of equal weight w j , This replacement preserves the estimate of any observable because: where we are assuming N patch /K is an integer. Since O patch is unchanged by this reweighting, the true value of Var[ O patch ] is also unchanged. Said another way, one can replace K events in a small patch with a single representative event and achieve approximately the same statistical properties.
As a result, one can reduce the number of events required for expensive detector simulations by picking K > 1. This reduction in statistics is possible for any K, though in general one has to be careful when accounting for the uncertainty, which we now discuss.

C. Preserving Uncertainties
The naive uncertainty estimate in Eq. (5) is badly biased if one blindly uses the weights in Eq. (7). While it is possible to keep track of the local variance separately from the expected value, this is cumbersome and error prone. Instead, we advocate choosing K such that With this choice of K, one can estimate the variance in the usual way using Eq. (5) without any additional overhead. This is the key observation that underpins our neural resampling method. In general, each phase space patch will require a different value of K. To determine this K patch , it is convenient to rewrite Eq. (9) as: Combining Eq. (10) and Eq. (7) provides a prescription for choosing K patch : Taking the continuum limit, Eq. (11) becomes with expectation values conditioned on the phase space points in X.
The above discussion can be encoded in the following practical algorithm to reweight and resample Monte Carlo events while preserving uncertainties: 4. For each event i, keep it with probability 1/ K(x i ); otherwise discard the event. Because K(x i ) ≥ 1 by construction, no event will be repeated.

For each kept event, set the new event weight to be
, which is the continuum limit of w from Eq. (7).
The computational benefit of using W (x i ) over w i is true even if there are no negative weights. As long as a given phase space point has a non-trivial spectrum of weights, the above reduction will decrease the computational cost of subsequent detector simulation with the same asymptotic statistical properties. The next section shows how to estimate W |X and W 2 |X using neural networks.

III. NEURAL RESAMPLING
As described above, a Monte Carlo generator draws a sample {x i } from X. Each phase space point x i has an associated weight w i , which can be positive or negative. Moreover, the weights need not be a function of X, meaning that the same phase space point can have different weights, as determined by the Monte Carlo sampling scheme. The goal of the positive resampler method of Ref. [1] is to rebalance the weights such that each value of x has a unique weight. Our neural resampler accomplishes this same goal through binary classification with neural networks.

A. Learning Event Weights
To learn new event weights, we train a neural network to distinguish between two samples: the original sample {x i } with weights {w i } and a uniformly weighted sample with the same phase space points {x i } but weights set to 1. For concreteness, we use the binary cross-entropy loss for this discussion, though other loss functions with the same asymptotic behavior would also work, such as the mean squared error. 2 The loss function to be minimized is: 2 One key difference between binary cross-entropy and mean squared error is that the former cannot learn negative weights. There are situations, particularly when using fixed-order Monte Carlo generators, where one encounters phase space regions with genuinely negative cross sections. We performed a preliminary test of this in the context of fixed-order top quark pair production with a parton shower subtraction scheme where, unlike the matched results in Sec. IV B, there are negative phase space regions. Using the mean squared error loss and linear activation in the final layer, we found good performance in the presence of both positive and negative cross section regions.
where g(x) is parameterized as a neural network with output range [0, 1]. We emphasize that the two sums in this loss function run over the same phase space points x i , just with different weights. This setup is identical to the second step of the OmniFold unfolding algorithm [20], where a generated dataset is morphed into a weighted version of itself. Taking a functional derivative of Eq. (13) with respect to g(x) and setting it equal to zero, one can show that the loss function minimum provides an estimate of W |X : This is just a manifestation of the standard result that asymptotically (i.e. with infinite training data, maximally expressive neural network architecture, and ideal training procedure) the output of a binary classifier approaches a monotonic rescaling of the likelihood ratio; see e.g. Refs. [13][14][15][16][17][18][19][20][21][22][23]. In our case, the original sample has asymptotic probability distribution where p uniform (x) is the phase space prior. The sample with uniform weights is not a proper probability distribution, since it is not normalized, but corresponds to N times p uniform (x). In this way, we learn local event weights that preserve the estimate of any observable via Eq. (2).

B. Learning Uncertainties
For the case studies in Sec. IV, we focus on situations where the initial weights are all ±c, for a fixed value of c. In such cases, W 2 |X = c 2 by construction and no additional training is needed to perform neural reweighting.
If there is a non-trivial spectrum of weight norms, we can repeat the logic of Sec. III A to estimate W 2 |X . This can be achieved using the loss function: whose asymptotic minimum satisfies: 3 In this way, we learn local variances that preserve the estimate of observable uncertainties via Eq. (5).

C. Implementation Details
To implement neural resampling, the (learned) W (x) and W 2 (x) functions from Eqs. (14) and (17) are simply inserted into the algorithm described in Sec. II C. If desired, one could further unweight the samples to make all of the weights uniform; see further discussion in Ref. [1]. We omit this optional step in our case studies, since full (or partial) unweighting does not preserve the original Monte Carlo uncertainties.
For the following results, neural networks are constructed using rectified linear unit (ReLU) activation functions for hidden layers and the sigmoid function for the last layer to be the output in [0, 1]. All models are implemented in Keras [74] with the TensorFlow backend [75] and trained using the crossentropy loss with the Adam [76] optimizer. For the one-dimensional example, the neural network is fully connected with three hidden layers, 128 nodes per layer, and trained for 10 epochs. The higher-dimensional example uses PFNs [24], based on the deep sets architecture [25], using the default parameters from https://energyflow.network/ and trained for 100 epochs. None of the hyperparameters have been optimized for these studies.

IV. CASE STUDIES
To illustrate the potential of neural resampling, we present two case studies. First, we consider an example involving two Gaussians that highlights the potential of our method in a simple case where the optimal results can be obtained analytically. Then, we consider a realistic collider example of top quark pair production (tt) at NLO matched to a parton shower, which allows us to demonstrate the robustness of our method to multidimensional (and variable-dimensional) phase space.
For each of the following examples, we work with initial event weights of w i = ±1, since this is the typical output of unweighted Monte Carlo generators. To map onto the discussion in Secs. II and III, one would need to normalize these weights to satisfy Eq. (1). Note that W 2 |X = 1 by construction, so we can skip the neural network training step in Eq. (16).

A. Two Gaussians
Our first case study involves two Gaussian distributions, one with positive weight and one with negative weight. Let X 0 ∼ N (0, 1) and X 1 ∼ N (0, 0.5), where N (µ, σ) represents a Gaussian distribution with mean µ and standard deviation σ. The weight for X 0 events is +1 while the weight for X 1 events is −1. We consider the case where the X 0 events happen three times more often than the X 1 events. The following results are based on 4M samples from X, i.e. 3M positive weight events from X 0 and 1M negative weight events from X 1 .
The weighted histogram of X is shown by the blue solid distribution in Fig. 1a, corresponding to a Gaussian with a dip in the middle. After neural resampling with K = 1, shown by the orange dotted curve, we obtain the same true distribution up to statistical fluctuations. As desired, subsampling with the optimal value of K from Eq. (12) does not change the change the probability density, as shown by the green dashed curve. This subsampling does change the uncertainties, though, as discussed more below. We therefore conclude that neural resampling has successfully preserved the cross section via Eq. (2).
A nice feature of this example is that the correct unbinned weights are computable analytically using the asymptotic formulas in Sec. II. In Fig. 1b, we show the original weight distribution together with the positive neural resampling weights using both K = 1 and the optimal value of K. While the original weights are both positive and negative, neural resampling yields strictly positive weights. We see that the finite sampling matches the expected analytic weight distributions, which is a non-trivial cross check of our neural reweighting code. Note that even though a binning and finite sampling are chosen to represent the data in Fig. 1, all of these distributions are fundamentally unbinned.
In Fig. 1c, we show the distribution of uncertainties using the standard Monte Carlo estimate based on summing the squared weights. With K = 1, the uncertainties are substantially underestimated relative to the original distribution, especially in the vicinity of x = 0, where there are large cancellations between negative and positive weights. Subsampling with the optimal K restores the original uncertainties, as desired. We therefore conclude that neural resampling has successfully preserved the uncertainties via Eq. (5).
As shown in Fig. 1d, subsampling yields a significant savings in terms of the number of events required to obtain the same statistical properties as the original sample. Only one third of the events are needed to capture the same behavior of the original events, with the savings greatest near x = 0 where there are larger relative uncertainties.
This two Gaussian example highlights the efficacy of neural resampling in a simple one-dimensional example. We now turn to a case of relevance to collider physics where the phase space is multi-dimensional.

B. Top Quark Pair Production
Our realistic collider case study is based on top quark pair production at NLO in quantum chromodynamics. At fixed order in an expansion in the strong coupling constant α s , it is well-known that cross sections can become negative due to the breakdown of perturbation theory in the vicinity of soft/collinear singularities (see footnote 2). These unphysical phase space regions can be regulated by matching to a parton shower, rendering the differential cross section to be positive. In addition, matching improves the accuracy of the cross section prediction by including resummation effects beyond NLO.
The resulting events in the HepMC 2.06.09 [87] format are processed with the HepMC reader module of Delphes 3.4.2 [88]. This setup is also used to cluster R = 0.4 anti-k t jets [89] with FastJet [90]. Jets originating from a b-quark are identified as such using the flavor tagging module in Delphes. Jets are labeled as "ISR" if they are not b-jets. We suppress the overall cross section information and take the weights to be ±1. After neural resampling, it is straightforward to rescale all of the weights to have the proper dimensions of a cross section, but we elide this step for simplicity.
To implement neural resampling, we use PFNs [24], where each event is represented as a variable-length set of four-vectors that correspond to the muons, neutrinos, and clustered jets. The PdgID [91] of the muons and neutrinos are used as a per-particle feature in the PFN, while the b-jets (ISR jets) are given the per-particle feature of 1 (0). In principle, we could have applied neural resampling directly to the final-state hadrons, but since we are only going to plot jet-related quantities, it makes sense to perform jet clustering before neural resampling.
Note that the phase space that is being reweighted here is variable dimensional, with at least 20 dimensions from the 3-momenta for 6 particles and 2 jet masses. These data are constrained to a 18-dimensional manifold after considering transverse momentum conservation and are approximately constrained to a 14-dimensional manifold after accounting for mass-shell conditions. Many events have far more dimensions due to additional jets. Thus, this is a highly non-trivial test of whether neural resampling can yield sensible results across a multi-dimensional and variable-dimensional phase space.
In Fig. 2a, we plot the distribution of transverse momentum (p T ) of the leading ISR jet. As expected, neural resampling has matched the bulk of this distribution up to statistical uncertainties. In the tails, where training data are more sparse, there are larger variations, but we emphasize that these results were obtained "out of the box" with no attempt to optimize training parameters.
The distribution of event weights is shown in Fig. 2b. The original sample had both positive and negative weights, but neural resampling has yielded a positive weight distribution as desired. With the optimal choice of K, the event weights extend out to about 10, which reflects the initial inefficiency of event generation with positive and negative weights.
The uncertainties are shown in Fig. 2c, again using the standard Monte Carlo estimate from Eq. (5). With K = 1, the uncertainties are underestimated throughout the phase space, but with the optimal value of K, they are captured correctly. Correspondingly, the optimal value of K yields a substantially lower number of events, as shown in Fig. 2d, indicating a large gain in downstream computational efficiency. In particular, from the original 2M events, only about 600k remain after neural resampling, with no change to the statistical properties.
To highlight that neural resampling yields sensible results across the whole (unbinned) phase space, we plot the distribution of the number of ISR jets with p T > 10 GeV in Fig. 3. We emphasize that no retraining is needed here, since these plots are based on the same weights already shown in Fig. 2b. Obtaining results like this that work for any observable of interest would be challenging with a binned approach. This case study therefore suggests that neural reweighting will be a powerful tool for efficient use of Monte Carlo generators at colliders.

V. CONCLUSIONS AND OUTLOOK
This paper has introduced neural resampling, a neuralnetwork-based extension of the binned positive resampler proposed in Ref. [1]. By exploiting the ability of neural networks to approximate likelihood ratios, our new approach is able to eliminate negative weights without binning and with access to potentially high-dimensional (and variable-sized) phase spaces. Furthermore, neural resampling preserves statistical uncertainties with minimal overhead and no adjustable parameters.
Given the growing availability of higher-order corrections and the computational demands of detector simulations, there is a need to make the best use of limited resources. The neural resampler is able to preserve the statistical power of these precision calculations while decreasing downstream computational demands. For the future, it would be interesting to study whether neural resampling could be incorporated more directly into the Monte Carlo generation process. This could lead to further computational gains, particular if used in concert with emerging neural-network-based phase space integration techniques.
Beyond just computational efficiency, there may be other advantages of neural resampling. Many analysis strategies that consider the full unbinned log likelihood over events require per event weights to be positive [92], which is guaranteed by neural resampling (assuming nonnegative cross section). Monte Carlo generators increasingly have the ability to keep track of uncertainties via weight variations [93,94], which may be straightforward to incorporate into neural resampling via parametrized networks [18,95]. Finally, neural resampling ensures that the event weights are a true function of phase space and not multivalued maps, even allowing the weight function to be differentiated. This feature of the learned weight function may turn out to be beneficial for other aspects of data processing and analysis.

Code and Data
The code for this paper can be found at https: //github.com/bnachman/NeuralPositiveResampler. The top quark datasets are available upon request.