DeeLeMa : Missing information search with Deep Learning for Mass estimation

We introduce DeeLeMa , a deep learning-based network for the analysis of energy and momentum in high-energy particle collisions. This novel approach is specifically designed to address the challenge of analyzing collision events with multiple invisible particles, which are prevalent in many high-energy physics experiments. DeeLeMa is constructed based on the kinematic constraints and symmetry of the event topologies. We show that DeeLeMa can robustly estimate mass distribution even in the presence of combinatorial uncertainties and detector smearing effects. The approach is flexible and can be applied to various event topologies by leveraging the relevant kinematic symmetries. This work opens up exciting opportunities for the analysis of high-energy particle collision data, and we believe that DeeLeMa has the potential to become a valuable tool for the high-energy physics community.


I. INTRODUCTION
Despite the numerous neutrinos generated during particle collisions, the detectors at the Large Hadron Collider (LHC) are unable to observe them directly [1,2].In addition to neutrinos, other elusive particles such as Dark Matter candidates, including Weakly Interacting Massive Particle (WIMP) [3,4], Axions [5,6], are also challenging to detect as they pass through the detector without leaving discernible signals [7,8].Such entities are termed as 'invisible particles' in the realm of collider physics.Their existence isn't directly observed but is inferred by leveraging the principles of energy and momentum conservation, which highlight discrepancies in momentum or energy within an event.
The LHC, like other hadronic collider experiments, measures the scattering processes involving the partonic constituents of hadrons; Within this context, the reconstruction of the longitudinal component of missing momentum along the beam axis (referred to as the longitudinal direction) poses a substantial challenge.Furthermore, the formidable nature of this endeavor becomes particularly pronounced when multiple invisible particles are simultaneously generated within the same event.This challenging issue is conventionally called the "missing information problem" of invisible particles.
Researchers commonly employ 'transverse' quantities to address the challenge from the longitudinal information.These transverse quantities are defined along directions perpendicular to the beam axis and include the transverse momentum (p T = ⃗ p 2 ⊥ ) and transverse energy (E T ≡ m 2 − p 2 T ) as observable parameters.Over the past decade or more, many kinematic variables have been devised and proposed, primarily tailored for the experiments at the LHC, such as the stransverse mass or the Cambridge M T 2 [9][10][11], M 2 [12][13][14], and their extensions [11,[15][16][17][18].However, it is worth noting that introducing more complex kinematic variables while aiding in obtaining missing information can also introduce additional complexities in data analysis.The precision of these variables may not always meet the desired level due to inherent complexities and uncertainties, including combinatorial errors and detector effects.For a comprehensive overview, see e.g., Ref. [19].
DNNs have emerged as a versatile tool capable of handling vast datasets and capturing intricate correlations among diverse features.This capability renders them exceptionally well-suited to tackle the complexities associated with missing information.Our newly developed kinematics-solving machine integrates the physical conditions and symmetries inherent in event shapes, is named "DeeLeMa ."This acronym, derived from "Deep Learning for Mass Estimation," encapsulates the essence of our machine's function.DeeLeMa represents a cuttingedge approach to the problem of kinematics estimation in collider physics, promising more robust and accurate results compared to traditional methods reliant on complex kinematic variables.The detail of the architecture is presented in the GitHub page 1 , where one can download DeeLeMa code with examples.

II. DeeLeMa FRAMEWORK
Our study is dedicated to unveiling concealed information within the complex landscape of high-energy collider events.We aim to achieve this objective by harnessing observable data, specifically the four-momenta of detected particles.Event topology, symmetry principles, and the steadfast application of conservation laws furnish constraints on the kinematic variables governing these events.
To illustrate, we examine a cascade event configuration consisting of N vis visible particles and N inv invisible particles in the final state, which can be succinctly represented as: Our primary goal is to utilize the input information encapsulated in the four-momenta of visible particles, denoted as p i for i = 1, 2, • • • , N vis , to precisely determine the momenta of each invisible particle in the final state, which we designate as q j for j = 1, 2, • • • , N inv .Nonetheless, it is crucial to notice that this kinematic problem becomes mathematically underdetermined when the count of unknown variables, N inv , surpasses the constraining relationships governing each event's momenta.
Utilizing a physics-informed machine learning approach, we build a model that decodes concealed information in collider events under a given event topology.Central to this approach are two functions: L, our loss function for neural optimization, and the function K serves as a mechanism that encapsulates kinematic relationships crucial for reconstructing the momenta of invisible particles.These functions are based on physical relations such as the on-shell mass conditions for the intermediate particles and the constraints on the transverse momentum.The structure of our DNN machine is schematically depicted in Fig. 1: • The event topology of the specific event is T , and the kinematic relations among momenta are encapsulated in K.
• The input for DeeLeMa is the visible information from the measured momenta • The expected output from DeeLeMa is the reconstructed momenta of the invisible particles {q j }, • The loss function L enforces the machine to learn to reconstruct the invisible information under the given event topology T and the kinematic relations K.
Additionally, we introduce the auxiliary parameters x which act to force target physical variables x (i.e.invariant mass) to converge into a single value for all training events.The corresponding auxiliary parameters x appear globally in all events, allowing the neural network to learn that the events come from the same physical process.Thus, they are introduced as global, trainable parameters based on prior knowledge from T .Consequently, DeeLeMa works to optimize the reconstruction of invisible momenta by minimizing the loss function L, which is defined in terms of the reconstructed kinematic quantities q and the auxiliary parameters x, subject to the kinematic relations K.

III. DeeLeMa FOR PAIR PRODUCTION PROCESS
In this section, our primary focus lies on the pair production of mother particles during particle collisions, where each of these particles subsequently decays, following identical decay chains.Under such circumstances, the scenario involves an even number of both visible and invisible particles, denoted as (N vis , N inv ) = (2n, 2m).Here, the terms n and m correspond to the visible and invisible particles in each respective branch.Exploiting the inherent symmetry of this situation, we find that there are precisely 8m unknown components originating from the 2m invisible four-momenta, along with (n + m + 2) constraints stemming from kinematic relations.
Mathematically speaking, the system becomes solvable when the condition (n + m + 2) ≥ 8m or equivalently n ≥ 7m − 2 is satisfied.A pertinent illustration is the case of m = 1, wherein a single invisible particle emerges in each of the decay chain branches.In this scenario, the system can be effectively solved when n ≥ 5.It is noteworthy to mention that earlier analyses on systems involving n = 3, m = 1 have been documented in previous works (see [23][24][25]), particularly when multiple events of the identical process were considered.
We now delve into a challenging 'unsolvable' problem characterized by the parameters n = 2 and m = 1, visually represented in Fig. 2.This specific configuration corresponds to an event topology of (N vis , N inv ) = (4, 2).A prominent example of this event topology is found in the dilepton process of t t events, where both top quarks undergo leptonic decay, leading to t → bW → b(ℓν ℓ ).In a more general context, we contemplate the pair production of mother particles, denoted as A 1 and A 2 , with subscripts 1 and 2 signifying the respective branches of decay.Each A i subsequently decays into a visible particle a i and an intermediate heavy state B i .Ultimately, B i undergoes a semi-invisible decay into a visible particle b i and an invisible particle C i in branches i = 1 and i = 2.The event can be succinctly expressed as: Here, p ai and p bi symbolize the momenta of visible particles, while q i represents the momentum of the corresponding invisible particles C i .Despite the apparent simplicity of this event topology, it is fundamentally underdetermined from a kinematic perspective, rendering the separate measurement of each invisible particle's momentum unattainable.
To define the loss function, we first select a set of "target variables" {x}, such as the invariant masses of the intermediate states and invisible out-coming particles.For our specific example: . . .
Consider a batch of dataset consisting of N training events.The Event-wise information, denoted as E, is derived from the symmetric event topology.This implies that identical particle masses are consistent, making E an independent piece of information for each event.
On the other hand, the Batch-wise information, represented by B, signifies that all training events are associated with the same physical event.We introduce auxiliary parameters, like mA and mB , to ensure that the masses across all events in a batch remain consistent (e.g., m A1 of all events are the same, and so on).This B information is dependent on the entire batch of events.Finally, our loss function is defined as: where |B| represents the batch size, indicating the number of events in a batch B, #i denotes the event index, and f is the target variable, either A or B. The functions d E (x 1 , x 2 ) and d B (x 1 , x 2 ) are distance functions for Event-wise and Batch-wise information, respectively.They satisfy mathematical conditions: (d1 for any y in the sample.Various distance functions can be used, such as The appropriate choice depends on the specific physical process under study. We illustrate the training procedure of DeeLeMa in FIG. 3. The target variable points (x #i 1 , x #i 2 ) are represented within spaces X 1 and X 2 , accompanied by the scalar value of the auxiliary parameter, x.By minimizing the loss function in Eq.( 1) from the initial learning step at t = 0 to the end of training at t = T , we ensure that spaces X 1 and X 2 come closer together.Additionally, the overall distribution of points within these spaces becomes more compact, leading to a reduction in their spread or dispersion.This compactness and reduction in dispersion are facilitated by the inclusion of the auxiliary parameter x.
For a comprehensive model implementation of DeeLeMa, refer to Appendix A.

IV. TEST OF DeeLeMa PERFORMANCE
In pair production, practical experiments often encounter issues with the misidentification of branches.Termed the combinatorics problem, this complication can result in erroneous kinematic relations, leading to substantial uncertainties in the derived solutions.To quantify the extent of this contamination, we introduce the parameter E C , defined as the fraction of incorrectly assigned events relative to the overall number of events, expressed as: We assess the efficacy of DeeLeMa through three distinctive test runs: • Test run (A) is conducted using a toy model featuring fixed values of m A = 1000 GeV, m B = 800 GeV, and m C = 700 GeV, with no combinatorial errors (E C = 0).
• Test run (B) mirrors (A) but incorporates varying rates of combinatorial errors, specifically E C = 0, 10%, 20%, 50%.This test aims to investigate the influence of combinatorial errors on the performance of DeeLeMa.
• Test run (C) is executed on the standard model t t and t → W b → (ℓν)b processes, encompassing E C = 20% and accounting for detector smearing effects.We consider this test run to closely simulate a realistic scenario.
We compare the results with those obtained using other existing methods: the transverse mass variable M T 2 and the on-shell constrained invariant mass variables M 2CC , which use similar constraints as DeeLeMa.We use the YAM2 package [37] to calculate M 2 optimally.
A. Toy model

Toy model test with no contamination (EC = 0)
We selected narrow width values for m A , m B , and m C at 1000 GeV, 800 GeV, and 700 GeV, respectively.The correlation heatmap in FIG. 4 displays the relationship between the reconstructed momenta (horizontal axis) and the true momenta (vertical axis) for the DeeLeMa method (left) and the M 2CC and M (ab) 2CC (bottom) for the toy example with EC = 0.
The upper panel of FIG. 5 shows the reconstructed mass distributions of B and A obtained with DeeLeMa for the toy example with E C = 0.The blue dashed lines indicate the reconstructed masses of m B1,2 and m A1,2 , respectively.The red vertical lines indicate the true masses, and the black dashed-dotted line shows the auxiliary mass after training.In the bottom panel, we compare the results with two existing methods based on M T 2 (gray) and M 2CC with suitable subsystems (b) and (ab) (orange and green), respectively [15].We can see that the reconstructed mass distributions with DeeLeMa are well centered around the true values, while the M T 2 distribution shows the physical mass at the end-point of the distribution, which often causes errors.The M The disparity arises from the manner in which global information is assimilated during the machine learning training phase, primarily facilitated through the auxiliary parameter x.Conversely, in the context of the M T 2 or M 2 method, global information is solely derived from statistical outcomes, primarily centered around kinematic endpoints.While numerous events are typically clustered around these endpoints, leading to a reconstruction of momenta close to the actual values, there is a lack of subsequent optimization within the M T 2 or M 2 based reconstruction process.
Consequently, the precision is notably diminished, with the kinematic endpoints becoming less distinct, particularly when grappling with combinatorial ambiguities and accounting for the effects of detector smearing.Subsequently, this degradation in accuracy will be demonstrated in the subsequent sections.

Toy model test with contamination (EC > 0)
To explicitly see the effect of combinatorics contamination, we conducted comprehensive test runs incorporating the possibility of combinatorial errors, with a concise summary of DeeLeMa 's performance presented in TABLE I.In these instances, the peak positions have displayed a slight shift towards larger values, owing to the influence of inaccurately assigned data implying a relatively higher mass.Despite accommodating up to 20% in combinatorial errors, DeeLeMa exhibits sustained resilience and commendable performance, accurately reconstructing masses within the 5 − 10% range of the true values.
Notably, for cases where E C ≤ 20%, the reconstructed masses are within the vicinity of O(1)% of the true values, attesting to DeeLeMa 's remarkable ability to mitigate the impacts of combinatorial challenges effectively.Collectively, our findings underscore DeeLeMa 's reliability and robustness as a method proficient in the precise reconstruction of masses, even in the face of demanding conditions prevalent in collider environments.

B. Realistic test with standard model t t
We finally present the results of our investigation on a more realistic case, the top quark pair production at the LHC, where top quarks decay semi-leptonically as In this case, we consider finite width effects with σ t = 1.4915GeV, 2CC and M (ab) 2CC (bottom) for the t t example.σ W = 2.0476 GeV, and m t = 173.0GeV, m W = 80.4190 GeV for the top quark and W boson, respectively.Moreover, we also account for the uncertainties related to the detector resolution.We simulated detector effects by applying Gaussian smearing to the momenta.However, to achieve more accurate results, we encourage the use of a more realistic detector simulation.For the two b jets, we applied Gaussian smearing with jet p T values of {10, 20, 30, 50, 100, 400, 1000} GeV and energy resolutions of {40, 28, 19, 13, 10, 6, 5}%, respectively [26,38].We took the combinatorial ambiguity at E C = 20% for our simulation.
We present the results obtained using DeeLeMa in Fig. 6 (upper).The distributions for the reconstructed masses (m t , m W ) show robust peaks near the true values (red vertical line), albeit slightly widened.To compare the performance of DeeLeMa with conventional methods, we also present the results obtained using M (ab) 2CC and M T 2 variables (lower).DeeLeMa provides more accurate results compared to conventional methods.In conventional methods, we need to read the endpoints in the lower distributions, which can be challenging in practice due to realistic effects from finite widths, detector smearing, and combinatorial mismatches.

V. CONCLUSION
We introduce DeeLeMa , a deep learning-based approach to analyze high-energy particle collisions with multiple invisible particles.DeeLeMa can reconstruct the event's invisible momenta and masses, even when multiple invisible particles are involved.Focusing on a challenging problem with (N vis , N inv ) = (4, 2), we demonstrate the efficiency of DeeLeMa : compared to conven-

FIG. 3 :
FIG. 3: Schematic representation of the role of the loss function in simultaneously bringing dE (blue double-headed arrow) and dB (red double-headed arrow) closer throughout the learning process t = 0 to t = T .
right) applied to the toy example with E C = 0. Ideally, the diagonal line (red, solid line) should represent perfect efficiency with p recon.= p true .As shown in the figure, the DeeLeMa method (left) exhibits a strong diagonal correlation pattern, indicating high accuracy in reconstructing the momenta.In contrast, the M (ab) 2CC method (right) shows a weaker and more scattered correlation pattern, implying a lower accuracy in momentum reconstruction.This demonstrates the superior performance of DeeLeMa over traditional methods.

25 FIG. 4 :FIG. 5 :
FIG. 4: [Toy] The correlation heatmap of the reconstructed momenta and the true momenta from the DeeLeMa (left) and M (ab) 2CC (right) for the toy example with EC = 0.
for m A show slightly improved performances, but still DeeLeMa provides the best results.

FIG. 6 :
FIG. 6: [Realistic t t] The reconstructed mass distributions of B and A using DeeLeMa (upper), and MT 2, M

TABLE I :
The summary table for combinatorial efficiency EC .tionalmethods that rely on kinematic variables such as M T 2 or M 2 , DeeLeMa delivers a significant improvement in accuracy.The reconstructed masses show sharp peaks in the distribution, and the results are robust against the combinatorial problem of misidentification of final state particles and detector-smearing effects.In conclusion, DeeLeMa has the potential to contribute to advances in the field as a new solid tool.