ν 2 -Flows: Fast and improved neutrino reconstruction in multi-neutrino ﬁnal states with conditional normalizing ﬂows

In this work we introduce ν 2 -Flows, an extension of the ν -Flows method to ﬁnal states containing multiple neutrinos. The architecture can natively scale for all combinations of object types and multiplicities in the ﬁnal state for any desired neutrino multiplicities. In t ¯ t dilepton events, the momenta of both neutrinos and correlations between them are reconstructed more accurately than when using the most popular standard analytical techniques, and solutions are found for all events. Inference time is signiﬁcantly faster than competing methods, and can be reduced further by evaluating in parallel on graphics processing units. We apply ν 2 -Flows to t ¯ t dilepton events and show that the per-bin uncertainties in unfolded distributions is much closer to the limit of performance set by perfect neutrino reconstruction than standard techniques. For the chosen double diﬀerential observables ν 2 -Flows results in improved statistical precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino Weighting method and up to a factor of four in comparison to the Ellipse approach.


I. INTRODUCTION
At collider experiments in particle physics, such as those at the Large Hadron Collider (LHC) [1], beams of hadrons or leptons are accelerated to high energies and collided together.These collisions result in an array of particles which are studied by experiments comprising complex detectors surrounding the interaction points.General purpose detectors, such as ATLAS [2] and CMS [3], are designed to record and reconstruct nearly all stable particles predicted in the standard model of particle physics (SM).From these reconstructed stable particles, both precision measurements of the SM as well as searches for new phenomena beyond the SM (BSM) are performed.
Neutrinos, stable particles produced in many collisions, interact only through the electroweak force and traverse the detectors without leaving a trace.Their presence in * john.raine@unige.ch† matthew.leigh@unige.chcollisions is inferred from a momentum imbalance in the transverse plane perpendicular to the beam axis.This imbalance, known as the missing transverse momentum #» p miss T , is calculated from the negative vector sum of the transverse momenta of all reconstructed objects in the transverse plane.
In order to reconstruct individual neutrinos, and thus fully reconstruct a single event, underlying assumptions need to be made on their origin and multiplicity.The #» p miss T serves as a proxy for all unobserved particles in the collision, but doesn't indicate how many were present, or how the momentum should be shared in the case of multiple neutrinos.Furthermore, at hadron colliders there is no experimental equivalent for the missing longitudinal momentum.
Several approaches are used to reconstruct neutrinos using #» p miss T and by setting constraints on the invariant masses of intermediate particles.In the case of top quark pair production (t t) in the semileptonic decay channel, where there is one lepton and one neutrino in the event, the neutrino momentum is typically reconstructed by arXiv:2307.02405v3[hep-ph] 15 Dec 2023 solving the longitudinal momentum component p z under the assumption that the invariant mass of the leptonneutrino pair is exactly that of the W boson [4][5][6][7][8][9][10][11][12].
For the case of events with two neutrinos, such as t t production in the dileptonic channel, more complicated methods [13][14][15] are employed in order to resolve the share of momentum between the two neutrinos in the event [12,[16][17][18][19][20][21][22][23][24][25][26][27][28].These approaches still require the invariant mass of the two lepton-neutrino pairs to match the W boson mass, and in addition require the invariant mass of the lepton-neutrino-jet triplets to match the top-quark invariant mass.
In ν-Flows [29] we use conditional normalizing flows [30][31][32] to learn the probability distribution of the neutrino momentum vector given the observed objects in an event for semileptonic t t events.From the learned conditional probability distribution, solutions can be sampled per event and the most probable solution can be determined from the learned likelihood of the solution.
In this work, we extend ν-Flows to more challenging final states with multiple neutrinos.We focus on t t final states in which both top quarks decay semileptonically.
This results in an expected final state with exactly two oppositely charged leptons, at least two jets, of which two should originate from b-quarks, and two neutrinos.
In comparison to the single neutrino case, where the main challenge arises in recovering the longitudinal momentum component of the neutrino, events with multiple neutrinos have the additional complexity of how to share the total missing momentum vector between all neutrinos in the final state.We show that ν-Flows yields improved reconstruction performance in comparison to standard approaches and demonstrate the direct benefit to the statistical precision in a simplified double differential t t dilepton analysis in observables such as the invariant mass of the t t system, m t t, and the angular separation of the two leptons, ∆ϕ(ℓ + ℓ − ).
The repository 1 and data2 used in this work are both made publicly available.

A. ν 2 -Flows
In ν-Flows, conditional normalizing flows are used to capture the distribution of possible solutions for the neutrino momenta given the reconstructed momenta of observed objects in a collision.The overall model comprises two components, the event feature extraction, and the conditional normalizing flow.The event feature extraction learns a representative vector of the event from the collections of reconstructed objects, namely the jets and leptons, and #» p miss T .The resulting vector is used as a conditional input to the normalizing flow, in order to learn the conditional density of all possible neutrino solutions from the training data.
In this work we extend the initial ν-Flows architecture to predict multiple neutrinos and accommodate any number of leptons in addition to jets by using attention transformers [34].We label this architecture as ν 2 -Flows to distinguish it from the general method.In order to handle two neutrinos we double the dimensionality of the conditional normalizing flow (from three to six).The neutrinos are also always predicted in the same order for each event, with the momentum of the neutrino followed by the momentum of the anti-neutrino.When increasing the neutrino multiplicity further, the same procedure is used together with a predefined ordering for the neutrinos.The architecture of the normalizing flow is otherwise kept largely the same.

The most substantial optimisation has been performed
on the feature extraction network.The feature extraction network attempts to produce a contextual vector, spe-FIG.1: A schematic of the ν 2 -Flows network for learning the conditional likelihood of multiple neutrinos in the event.The network uses a transformer encoder (TE) with cross-attention (CA) with a learnable class token (CT) to embed an event representation for any multiplicity of physics objects.This operation is permutation invariant and can operate on any jet and lepton multiplicity.Each physics object has its own dedicated embedding network and additional event information (Misc) is used to condition the transformer encoder blocks.The representation vector is used to condition the transformation with the normalizing flow.cific to each event, to guide the transformations within the normalizing flow.In the single lepton case, ν-Flows uses an attention pooled deep set [35] to process the jets, with #» p miss x , #» p miss y , the lepton four momentum, and some event level information as extra conditional information.
To extend ν 2 -Flows to multiple leptons, we require a permutation invariant architecture that can accommodate a variable number of both jets and leptons, motivating the move to attention transformers.
To train ν 2 -Flows all jets and leptons are represented by their four-momentum vectors in the form (p x , p y , p z , log E).Jets are assigned an additional binary decision on whether they are tagged as originating from a b-quark.Leptons are identified as being either an electron or a muon, as well as whether they had positive or negative charge.The target neutrino momenta are expressed as (p x , p y , p z ) for both the neutrino and anti-neutrino.
The full set of inputs to the network are provided in Table I.The coordinates chosen to describe the input and target object kinematics were optimised in a grid search.
A schematic of the new architecture for ν 2 -Flows is shown in Fig. 1, which makes use of attention trans- are all independently embedded into higher dimensional space using object specific multi-layer perceptrons (MLP). 3The embedded objects subsequently interact through a transformer encoder using several layers of multi-headed attention.Additional event information (Misc) containing object multiplicities, 3 Other final state objects such as photons and tau leptons can also be accommodated by embedding them additional MLPs for each particle type.
is injected into the network as conditional information by concatenating the vector to each token within the transformer encoder blocks.To obtain a single global vector from transformer as our conditioning vector for the normalizing flow, we perform repeated cross-attention with a learnable class token (CT), a common technique used in vision transformers [36].
The ν 2 -Flows model in this paper comprises three transformer encoder blocks and two cross-attention blocks, each with an embedding dimension of 128 and 16 attention heads.All MLPs in the network have a single hidden layer with 256 neurons and use the LeakyReLU activation [37] and Layer-Normalization [38].The output of the transformer is passed through an MLP to produce the context tensor for the flow with a dimension of 128.The invertible neural network employs 10 rational quadratic spline (RQS) coupling blocks [39] interspersed with LU-decomposed linear layers, implemented with the nflows package [40] and Pytorch v2.0 [41].
Each RQS has 10 bins with linear tail bounds outside We train the ν 2 -Flows model for 100 epochs using the AdamW optimizer [42] with a learning rate cycling from 10 −8 to 10 −3 and back every 50 epochs.We use weight decay with a strength of 10 −4 .we compare ν 2 -Flows to the ν-Weighting due to its common usage but also the Ellipse method due to its reduced computation time.

ν-Weighting
In ν-Weighting the kinematic properties of the neutrino are extracted from the identified leptons, jets and missing transverse momentum in the event.For our implementation we follow the prescription described in Ref. [27].Constraints on neutrino solutions using the invariant mass of the top quark and W boson are applied, where ℓ 1,2 , ν 1,2 , and b 1,2 represent the fourmomenta of the charged leptons, neutrinos and b-tagged jets.However, this is not enough to fully constrain the kinematics of the neutrinos.Therefore, the neutrino and anti-neutrino rapidities (η ν and η ν ) are individually hypothesized and tested.For each pair of values for η ν and η ν , we solve the mass equations in Eq. ( 1), yielding two possible solutions for the full pair of neutrino kinematics.Each solution produces an inferred missing transverse momentum #» p ν ν T vector which can then be compared to the observed #» p miss T .This comparison defines a weight where σ is a fixed resolution scale related to the #» p miss T reconstruction in the detector.In ν-Weighting the hypothesis that maximizes w is chosen as the correct solution.
To find all solutions we perform a grid search of η ν and η ν with values ranging from -5 to 5 with a step size of 0.2.
For each of these we also need to test all combinations of assigning b-tagged jets to each of the b-quarks from the t t decay.This represents a very costly computation scan that considers only discrete η values.
Another significant drawback of this method is that despite the large number of neutrino solutions being tested, it is still possible that the constraint systems are not solvable.This can be due to mis-measurement or misassignment of the leptons, jets, or #» p miss T .Alternatively, this can arise from the masses of either the top quarks or W bosons in the event deviating from the nominal values.Therefore, to increase the success rate of this method we also iterate over different values of m t from 171 to 174 GeV with a step size of 0.5 GeV.This further increases the computational requirements, but increases the efficiency of finding a solution for each event.
After selecting the solution with the highest w, any solution which results in m t t < 300 GeV or where either of the two reconstructed top quarks have negative energy is rejected.The ν-Weighting method is unable to find a valid solution on the nominal dataset around 5% of the time.

Ellipse method
The Ellipse method is derived from a geometric approach to analytically constrain neutrino kinematics explicitly in processes where top quarks decay into leptons and neutrinos [15].For a single neutrino, its momentum can be calculated as a function of the 4-momenta of the b-quark and the charged lepton, the W boson mass, and the top quark mass.The solution set for this function defines the surface of an ellipse.By combining this information with the observed #» p miss T , the solution set collapses to a unique value.For events with two neutrinos in the final state, the method is extended to calculate the solution for neutrino pairs which are most likely to have produced the observed #» p miss T .We use the implementation from the authors of the Ellipse method 4 .To solve the b-jet combinatorics we use 4 Implementation available at github.com/betchart/analytic-nu a simple minimum ∆R matching between the leptons and the b-jets in the event.To reduce the combinations we only take the two leading jets in p T passing b-tagging criteria.If the ∆R matching yields no solutions for the neutrino kinematics using the ellipse method, the opposite association is tested.
The drawbacks of this approach is that it requires accurate matching between each lepton and the associated b-jet in the event.Furthermore, like ν-Weighting, it requires one to make hard assumptions on the mass of the W boson and top quark.Finally, it is possible that the method can yield no solutions, just like ν-Weighting.The implementation used in this work fails to find solutions in 22% of t t dilepton events.In comparison to ν-Weighting, the Ellipse method requires much less computational resources per event.

III. DATASET
In this work, ν 2 -Flows is applied to simulated t t events where both top quarks decay semileptonically, resulting in a final state with exactly two leptons (ℓ) 5 , two neutrinos (ν) and two jets initiated by b-hadrons (b-jets).Additional jets arise from initial and final state radiation.
All events are simulated in proton-proton collisions at a centre-of-mass energy of √ s = 13 TeV.Two different samples are generated, each using a different generator for the hard interactions in the matrix element.
In the alternative sample, both the hard interactions and parton shower are simulated with Pythia8 (v8.307) with the Monash tuned set of parameters [47] at leading order accuracy.The same PDF set is used as for the nominal sample.
The detector response is simulated using Delphes [48] (v3.4.2) with a parametrisation similar to the response of the ATLAS detector [2].Jets are reconstructed using the anti-k t clustering algorithm [49] with a radius parameter of R = 0.4 using the FastJet package [50].In total there are 1.02 million events in the nominal sample and 1.4 million events in the alternative sample passing all selection requirements.940,000 (970,000) events from the nominal (alternative) sample are used to train the network, with 80,000 nominal samples used for evaluation.

A. Neutrino reconstruction
The first measure of performance is to validate that ν 2 -Flows is able to correctly reconstruct the momenta and relative positions of the neutrino pair.To establish a baseline for the impact of neutrino reconstruction in all distributions, we define ν-Truth to be the case where the truth neutrinos are reconstructed perfectly, but all other objects in the event remain the same.This defines the ground truth of the target distributions and is also the upper limit in performance for any neutrino reconstruction approach.We compare the reconstruction per-  ever, Ellipse tends to reconstruct top quarks with a harder p T .In Fig. 5  As also performed in Ref. [29], we perform a cross check on the benefit of using the normalizing flow in the ν 2 -Flows architecture.We train the ν 2 -Flows architecture but without the flow and predict the two neutrino momenta directly, establishing a simple machine learning baseline.The performance achieved is substantially worse with strong biases in all neutrino kinematics and resulting event level distributions.

One of the main drawbacks of ν-Weighting, and why
Ellipse is often considered despite the reduced performance, is the computational resources required.In comparison, ν-Flows requires only a single forward pass for each event.The typical inference times on a CPU for single event inference are around 70 ms, with the computation time decreasing substantially with parallelised execution on a GPU as summarised in Table II.

B. Unfolding analysis
In order to evaluate the downstream impact of the improved neutrino reconstruction from ν-Flows, we follow the unfolding analysis performed in Ref. [26], where a double differential cross section measurement is performed to measure the spin correlation in t t events, by measuring the invariant mass of the t t system m t t and the angular separation between the two leptons ∆ϕ(ℓ + ℓ − ).
Reconstruction of the two neutrinos is crucial in order to fully reconstruct the t t system, which in Ref. [26] is performed using ν-Weighting.To benchmark our model, we replace the neutrino reconstruction with the result from ν 2 -Flows.In addition to ν-Weighting we also compare the performance to the Ellipse method due to its reduced computational complexity.
We focus on the reconstruction of individual observables dependent on the neutrino kinematics as well as the statistical precision of the unfolded distributions.In addition to ∆ϕ(ℓ + ℓ − ), we look at other observables in conjunction with m t t, motivated by the distributions measured in Ref. [51].These observables are described in Table III and the corresponding bin edges for the double differential unfolding and corresponding response matrices are shown in Table IV.
All distributions are compared to ν-Truth, and the total uncertainty in each bin after unfolding is calculated with respect to the optimal performance achieved when using ν-Truth.The correct jet and lepton association is used for both top quarks in order to remove the effects arising from matching inefficiencies.We perform the unfolding using the Singular Value Decomposition (SVD) method [52] with a regularisation factor of 7, using the implementation provided in RooUnfold [53].The regularisation factor was optimised for the ν-Truth distributions for a reduced χ 2 value closest in agreement to one for the four double differential distributions.In all cases we only consider the t t process and ignore the impact of background estimation and subtraction in the methods.
Although more modern machine learning approaches for unfolding are an active area of research [54][55][56][57][58][59], we leave their study and application to dilepton t t events to future studies, in particular the study of unbinned multidimensional unfolding with the reconstructed neutrino kinematics.
The response matrix using each of the neutrino reconstruction methods is shown in Fig. 6 for the two dimensional binning in m t t and ∆ϕ(ℓ + ℓ − ) and in Fig. 7 for m t t and p t T .In the ideal case only the main diagonal would contain entries, however due to inefficiencies in the neutrino reconstruction methods as well as detector resolution effects, off diagonal elements are unavoidable.In both cases it is clear that using ν 2 -Flows to reconstruct FIG.7: Binned response matrices for the double differential measurement of m t t and p t T when using each of the three methods for neutrino reconstruction.The binning is symmetric for both the parton and detector level observables, however the m t t bins are labelled on the x-axis with the p t T bins labelled on the y-axis.The trace fraction is calculated for each method for a simple quantitative comparison and is 0.62 when using ν-Truth.
the neutrino pair results in a more diagonal response matrix than the other two approaches.This is quantified by the trace fraction of each matrix.
Although the trace fraction can give a good measure of which method is performing best, the off diagonal elements still contribute to the unfolded distributions.
To quantitatively assess the true impact of using each method, the response matrices are inverted using SVD and the overall uncertainties for each bin at parton level are calculated.

C. Robustness to training sample
In comparison to the standard analytical approaches, ν-Flows is trained on a specific sample of Monte Carlo simulated events.This could introduce a performance dependence on the sample used for training, which may not be optimal for all generators.It should be noted that the same model is used for all events and, just like the analytical approaches, is independent of which samples it is applied to.However, if ν-Flows has learned sample specific effects this can lead to a suboptimal performance or even unusable levels of performance when applied to other samples.
To study the impact of this effect we train ν 2 -Flows using the alternative t t dilepton sample (ν 2 -Flows (Pythia8)) and use it to reconstruct the neu-trinos for the nominal t t sample.We compare the reconstructed kinematic distributions as well as the uncertainties in each bin of the unfolded distributions.
Negligible differences are observed in the reconstructed neutrino kinematics, though the difference can clearly be seen for the reconstructed W boson mass and a slight difference is also seen for the reconstructed top quark invariant mass.Some small differences are also observed in the tails of the reconstructed top quark and t t properties in Fig. 9, however the performance is still substantially improved in comparison to ν-Weighting and Ellipse.The response matrix for ν 2 -Flows (Pythia8) for the double differential distribution in m t t and p t T is shown in Fig. 10.These differences translate to a very slight change in the statistical precision in each bin after performing the unfolding.truth m t has been changed to either 171 GeV or 175 GeV.
These samples each have 160,000 events and are otherwise the same as the nominal sample.
We compare the reconstructed top quark mass using the four momenta of the lepton, b-quark and neutrino (m t ) to the invariant mass using only of the lepton and b-quark from the same top decay (m bℓ ).
The distributions are shown in Fig. 12.Despite training purely on events with the nominal top quark mass, the reconstructed distribution using ν 2 -Flows is sensitive to the difference in truth m t .The separation between the three templates is similar to using m bℓ , however for ν 2 -Flows the difference is more prominent in the bulk of the distribution.This sensitivity could be improved by training ν 2 -Flows on samples with a range of values for m t and parametrising the network, however this would introduce additional computational complexity to the method.Another benefit of ν 2 -Flows is also demonstrated, with a smoother templates constructed by sampling multiple neutrino solutions for each event.

V. CONCLUSIONS
With ν 2 -Flows we have built upon the success of the ν-Flows method to employ conditional normalizing flows to reconstruct the momentum vectors of multiple neutrinos in a single event.In comparison to other commonly used approaches, ν 2 -Flows is able to reconstruct both neutrinos without enforcing strong constraints on reconstructed particle masses in order to find solutions in an under-constrained system.This translates to a reduced bias in the reconstruction of neutrinos, without a preference for back-to-back neutrinos, and with a more accurate reconstruction of the kinematics of individual top quarks and the full t t system.The reconstructed neutrinos can be used directly or potentially combined with other machine learning approaches which aim to reconstruct the underlying hard scatter event [60][61][62][63][64].The generalised architecture introduced in ν 2 -Flows has been designed to be easy to extend to any neutrino multiplicity, and does not place restrictions on the multiplicities of reconstructed objects, or how they are combined to extract information from the event.
In applying ν 2 -Flows to dilepton t t events we achieve for all events and the fast single event inference below 75 ms on a single computing core.As each sample is associated with a probability from the transformation under the normalizing flow, this could also provide opportunities for separating t t events from background processes, similar to the weight in ν-Weighting.This could be done using the probability of the single solution per event, or, as multiple solutions can be sampled for each event, the highest possible probability value for an event.FIG.13: The invariant masses of the reconstructed top quark and t t system when using the three neutrino reconstruction methods discussed in the paper, as well as a feed-forward regression model ν 2 -FF, in comparison to ν-Truth (shaded grey).This plot highlights large negative bias induced by the feed-forward model.

Origin of improvement over ν-Weighting
In the ν-Weighting a weight is used to find the best solution, as defined in Eq. ( 3), which can also be interpreted as how good a solution is.The improvement from ν 2 -Flows could arise from the events with low values of w or no solutions, or come from all events regardless of how well ν-Weighting performs.
To investigate this we compare the reconstruction performance for events where ν-Weighting has either a good solution or a poor solution.We define good performance as values of w > 0.9 and poor performance as w < 0.3.
The p t T , p t t T and m t t distributions with these selections are shown in Fig. 14.Although substantial improvement can be seen over ν-Weighting in regions where ν-Weighting has a low weight, ν 2 -Flows still exhibits improved agreement with ν-Truth.It is equally encouraging that there is no noticeable difference in the quality of reconstruction with ν 2 -Flows for events that can be considered well reconstructed or poorly reconstructed by ν-Weighting.

±4.
The conditional normalizing flow is trained with the standard maximum likelihood estimation loss obtained through the change of variables formula, and transforms the input neutrino momenta to a standard multivariate normal distribution.The entire ν 2 -Flows model, including the transformer, has around 600 000 trainable parameters.
analytical techniques have been proposed to solve the reconstruction of the two neutrinos in dilepton t t events.Amongst these are Neutrino weighting [13] (ν-Weighting), an algebraic solution [14], and the Ellipse method [15].These have been successfully employed in a wide range of measurements at the Tevatron and LHC [12, 16-28], most notably ν-Weighting.In this work All jets are required to have a transverse momentum p T > 25 GeV and fall within |η| < 2.5.A b-tagging working point corresponding to 70% inclusive signal efficiency is used to identify jets as originating from b-hadrons.Up to 10 jets are selected per event, ordered in descending p T .Events are required to have at least two b-tagged jets, and two oppositely charged leptons, each with p T > 15 GeV and |η| < 2.5.Truth association of jets to the b-quarks in the t t hard scatter is performed using a ∆R matching, with partons matched to jets within ∆R < 0.4.Events where multiple partons are matched to the same jet are removed.Truth association of the lepton to the parent top quark is performed assuming there is no charge misidentification, and the true neutrino momenta are taken directly from simulation.

FIG. 3 :
FIG.3:The angular separation in η and ϕ between the reconstructed neutrino pair per event for the three reconstruction methods and ν-Truth (shaded grey).The hashed areas represent statistical uncertainties in the ν-Truth prediction.

FIG. 4 :FIG. 5 :
FIG.4:The reconstructed invariant mass of W bosons (left) and top quarks (middle), as well as the top quark p T (right) when using the three neutrino reconstruction methods in comparison to ν-Truth (shaded grey).

FIG. 6 :
FIG.6: Binned response matrices for the double differential measurement of m t t and ∆ϕ(ℓ + ℓ − ) when using each of the three methods for neutrino reconstruction.The binning is symmetric for both the parton and detector level observables, however the m t t bins are labelled on the x-axis with the ∆ϕ(ℓ + ℓ − ) bins labelled on the y-axis.The trace fraction is calculated for each method for a simple quantitative comparison and is 0.73 when using ν-Truth.

Figure 8 FIG. 8 :
Figure8shows the relative statistical uncertainty for each method with respect to ν-Truth.The values are obtained for each bin of the unfolded distributions using SVD with the chosen level of regularisation.For the individual bins of the four double differential distributions the uncertainties are typically a factor of 1.5 to two times smaller when using ν 2 -Flows compared to ν-Weighting, and up to four times smaller in comparison with Ellipse.

FIG. 9 :
FIG. 9: The invariant mass, p T , and rapidity of the reconstructed t t system when using ν 2 -Flows trained on the nominal or alternative sample, in comparison to the two baseline approaches and ν-Truth (shaded grey).

FIG. 10 :
FIG.10: Binned response matrices for the double differential measurement of m t t and p t T when using ν 2 -Flows (Pythia8) but evaluating on the nominal t t sample.The binning is symmetric for both the parton and detector level observables, however the m t t bins are labelled on the x-axis with the p t T bins labelled on the y-axis.The trace fraction is calculated for each method for a simple quantitative comparison and is 0.62 when using ν-Truth and 0.31 for the nominal ν 2 -Flows.

Figure 11 FIG. 11 :
Figure11shows the reconstructed distributions for the three approaches.As before, ν 2 -Flows exhibits very good agreement across the majority of the kinematic phase space, despite having been optimised for a different sample.

FIG. 12 :
FIG.12:The reconstructed top quark mass using just from the lepton and b-quark from the top quark decay (m bℓ , left), the full invariant mass using ν 2 -Flows to reconstruct the neutrinos from the top quarks (m t , middle), and a smoothed template for m t obtained by sampling 256 solutions for each event with ν 2 -Flows (right).The statistical uncertainty on the distributions arising from the training dataset has not been propagated into the final statistical uncertainty.

FIG. 15 :
FIG.15:The reconstructed invariant mass of the W boson and top quarks when using the three neutrino reconstruction methods in comparison to ν-Truth (shaded grey) as well as the alternative ν 2 -Flows (Pythia8) model.

FIG. 16 :FIG. 17 :
FIG.16:The invariant mass, p T , and rapidity of the reconstructed t t system (top row) and the invariant mass, and p T of the reconstructed top quarks (bottom row) for ν-Truth with the three independent simulated samples.

FIG. 19 :
FIG. 19: Transformer encoder block (left) and cross-attention block (right) comprising multi-headed attention, layer normalisation and simple linear layers.Residual connections are used after the multi-headed attention and linear operation.Conditional information is provided as context by concatenating it to each token before the linear layer.

TABLE I :
The different input observables used as inputs to the feature extraction network.
z ,log E j Jet momentum 4-vector isB Whether jet passes b-tagging criteria Misc Njets, Nbjets Jet and b-jet multiplicities in the event formers and object specific embedding networks.Initially, the jets, leptons and #» p miss T

TABLE II :
Required time for single event inference using ν 2 -Flows.Times representative of using a single core of an AMD EPYC 7742 2.25GHz CPU and an NVIDIA ® RTX 3080 graphics card.

TABLE III :
Kinematic observables of the reconstructed t t system studied for an unfolding analysis in dilepton events.
TTransverse momentum of the top quarkp t t TTransverse momentum of the t t systemy t tRapidity of the t t system

TABLE IV :
Bin edges used for each of the kinematic observables of the reconstructed t t system studied for two dimensional unfolding analyses.

TABLE V :
Relative uncertainty in each bin of the respective unfolded double differential distributions for each neutrino reconstruction method with respect to the uncertainty when using ν-Truth.The bins are ordered first by increasing m t t followed by the second variable, with vertical dividers indicating the bin edges in m t t.The method with the smallest relative increase in uncertainty in comparison to ν-Truth is highlighted in bold.

TABLE VI :
Efficiency for finding a solution in each bin of the respective unfolded double differential distributions with ν-Weighting and Ellipse.The bins are ordered first by increasing m t t followed by the second variable, with vertical dividers indicating the bin edges in m t t.The efficiency of ν 2 -Flows in all bins is 100%.