Resolving Combinatorial Ambiguities in Dilepton $t\bar t$ Event Topologies with Constrained $M_2$ Variables

We advocate the use of on-shell constrained $M_2$ variables in order to mitigate the combinatorial problem in SUSY-like events with two invisible particles at the LHC. We show that in comparison to other approaches in the literature, the constrained $M_2$ variables provide superior ansatze for the unmeasured invisible momenta and therefore can be usefully applied to discriminate combinatorial ambiguities. We illustrate our procedure with the example of dilepton $t\bar{t}$ events. We critically review the existing methods based on the Cambridge $M_{T2}$ variable and MAOS-reconstruction of invisible momenta, and show that their algorithm can be simplified without loss of sensitivity, due to a perfect correlation between events with complex solutions for the invisible momenta and events exhibiting a kinematic endpoint violation. Then we demonstrate that the efficiency for selecting the correct partition is further improved by utilizing the $M_2$ variables instead. Finally, we also consider the general case when the underlying mass spectrum is unknown, and no kinematic endpoint information is available.


Introduction
Events with missing transverse energy 1 (E T / ) are arguably the most exciting class of events at the Large Hadron Collider (LHC). They offer the tantalizing possibility of discovering the elusive dark matter -if dark matter particles were produced in the LHC collisions, they would leave the detector without a trace, and the only sign of their presence would be the imbalance in the total transverse momentum of the event. Unfortunately, events with E T / are also notoriously difficult to interpret and analyze: • Instrumental effects. Since the missing transverse momentum / P T is measured only as the recoil against all other visible objects in the event, it can be easily faked by mismeasurement and the finite detector resolution [1]. This problem becomes more severe if the signature involves QCD jets, whose energies and momenta are poorly measured in comparison to leptons and photons.
• Unknown nature of the invisible particles. A priori, we do not know the nature of the invisible particles -they could be new particles, or simply the Standard Model (SM) neutrinos [2].
• Incomplete kinematic information. We do not know how many invisible particles were present in the event to begin with [3][4][5][6][7]. We also do not know their individual momenta, and only the net sum / P T of their transverse components is available.
The first step in the analysis of any sample of E T / events is to hypothesize a certain event topology, and design suitable variables adapted to this interpretation [8]. It is already at this stage that one is facing a combinatorial problem, namely, how to associate the various reconstructed objects in the event to the elementary particles in the final state of the event topology. Only in very special cases does the problem not arise -if the event topology is very simple and/or all final state particles are distinct. In general, a typical E T / event at the LHC does suffer from a combinatorics problem, for the following two reasons: • At hadron colliders like the LHC, strong production of colored particles is the dominant production mechanism. When those colored particles decay to the invisible dark matter candidates, the color is shed in the form of QCD jets, which can be confused with jets from initial state radiation (ISR) [9][10][11][12]. In fact, the ISR combinatorics problem is very general and affects any multijet events at hadron colliders, regardless of the presence of E T / in the event.
• The lifetime of the dark matter particles is typically protected by some new symmetry. This is often chosen to be a discrete Z 2 parity, under which the SM particles are even, while the new physics particles are odd. In that case, the new particles are necessarily pair produced, so that each event contains two independent decay chains. This creates a partitioning ambiguity, since the experimenter has to decide whether to assign each reconstructed object to the first or the second decay chain [13]. Wrong assignments would tend to wash out the desired kinematic features and degrade the measurements.
In principle, the combinatorial problem can be addressed in two different ways: • Sidestep the combinatorial problem. The idea here is to design the analysis in such a way that the combinatorial problem does not become an issue. Two possibilities are: -Use global inclusive variables which do not suffer from a combinatorics problem. These variables treat the event as a whole and thus do not depend on the exact event topology, and the combinatorics problem does not arise in the first place. Some well known examples are M ef f [14,15],ŝ min [16], E T / [17], etc. The disadvantage is that such variables are suboptimal when compared to more exclusive variables which take advantage of the individual characteristics of the event topology.
-Use variables which optimize over all possible combinatorial assignments. In this case, instead of trying to figure out the correct assignment in a given event, one considers all possibilities, then chooses the one 2 which preserves the relevant useful property of the kinematic variable used in the analysis. As an example, consider an attempt to measure the upper kinematic endpoint of some relevant distribution, such as a two-body invariant mass or the Cambridge M T 2 variable [18]. One could simply compute the value of the variable under all possible assignments, then choose the smallest among them to be used in the analysis [14,[19][20][21][22][23][24][25][26][27]. 3 While this procedure is guaranteed to preserve the kinematic endpoint, it also adversely distorts the shape of the kinematic distribution in the vicinity of the endpoint, making it more difficult to observe in the presence of SM background.
• Resolve the combinatorial problem by choosing the "best" assignment event by event.
In this case one tries to design an algorithm which will single out one (or maybe several) among the many possible assignments as the most likely "correct" assignment, then use the value of the kinematic variable obtained with this specific choice. Ideally, the algorithm should return a unique selection, which would be correct 100% of the time. Unfortunately, this is rarely achievable in practice, and an important measure quantifying the success of the algorithm is the purity of the resulting sample, i.e., the fraction of events in which the combinatorics was successfully resolved. In principle, there can be different approaches to designing such an algorithm, from the use of a single exclusive variable to a multivariate technique like a neural network analysis [28]. For example, depending on the process at hand, one can attempt to tag ISR jets by a suitable combination of cuts on the jet rapidity and transverse momentum [29] or on the invariant mass and M T 2 [30]. The partitioning problem into two decay chains is usually addressed by the so-called "hemisphere" algorithm, developed originally within CMS [31] and later adopted in many phenomenological studies [32][33][34]. There have been attempts to further improve on the hemisphere algorithm by suitable cuts on the invariant mass and either the jet p T [35] or M T 2 [36], by excluding certain reconstructed objects from the clustering algorithm [21,37], or by recursive jigsaw reconstruction [38]. In general, methods which invoke fewer assumptions, are robust and model independent, but lead to rather vague conclusions, while methods with more assumptions give better results, but are not generally applicable.
In the case of E T / events, the combinatorics problem is exacerbated by the fact that the momenta of the invisible particles are unknown. If the decay chains are sufficiently long, so that there are enough kinematic constraints, one can attempt to compute the individual invisible particle momenta on an event per event basis [39][40][41][42][43]. Unfortunately, this procedure itself suffers from a combinatorics problem, which only becomes worse as the decay chains get longer (as required for the method to work). For shorter decay chains, like the ones considered in this paper, the method does not apply.
Since the invisible momenta cannot be reconstructed exactly, the next best thing to do is to use some sort of an approximation for them [44]. Again, different approaches are possible. For example, one could use a matrix element method (MEM) to select the most likely values of the invisible momenta. However, the MEM itself suffers from combinatorics, and is rather model dependent since it requires us to fully specify the underlying physics. A better approach would be to rely only on kinematics and obtain the invisible momenta by optimizing a suitable kinematic function. But what constitutes a good target function for such optimization? Initially, the focus was placed on transverse mass variables like M T 2 [18,45] and its variants [46][47][48][49]. While transverse quantities are Lorentz invariant under longitudinal boosts, they only provide an ansatz for the transverse components of the individual invisible momenta, and one still needs to provide a supplementary procedure for calculating the longitudinal components of the invisible momenta. One such complementary technique is the MAOS 4 reconstruction [50], where one imposes an additional on-shell kinematic constraint which can be solved for the longitudinal momentum component of each invisible particle. It has been shown that the MAOS approach provides a reasonably good approximation to the true values of the invisible momenta, and can be usefully applied for mass and spin measurements [50][51][52]. The MAOS technique was then used to design a novel algorithm [53] for resolving the combinatorial ambiguity in dilepton tt events, further expanding on the ideas from Refs. [35,36]. The algorithm aims to resolve the two-fold 5 ambiguity in selecting the correct lepton-jet pairing and involves the following three steps: • Step I. Following the proposal of Ref. [36], some number of wrong lepton-jet combinations can be eliminated if they violate the expected endpoints in the distributions of the invariant mass m b and M T 2 .
• Step II. Utilizing the ansatz found in Step I for the transverse components of the invisible momenta, attempt a MAOS reconstruction of the longitudinal components in two cases: 1. using the known value of the top quark mass m t ; 2. using the known value of the W -boson mass m W .
Eliminate additional wrong combinations if the solutions for the longitudinal momenta in either case turn out to be complex. • Step III. In this final step, one uses the reconstructed masses for the W -boson (in case II.1) and the top quark (in case II.2) in conjunction with M T 2 to decide which of the two lepton-jet pairings is the likelier one.
While this algorithm was originally designed to handle the two-fold ambiguity in tt events, where the mass spectrum is known, with suitable modifications it can also be applied to new physics searches, as advertised in Ref. [53]. For instance, in the MAOS reconstruction Step II, instead of using the fixed values of the known masses m t and m W , one could use the measured endpoints in the respective M T 2 subsystems [46]. Recently it has been pointed out that the M T 2 approach has a (3 + 1)-dimensional analogue in terms of a general class of on-shell constrained invariant mass variables M 2 [8,54,55]. Compared to M T 2 , the M 2 variables have several advantages: • Being defined in (3+1) dimensions, they allow us to easily and directly enforce all relevant on-shell constraints in a given event topology [8,56].
• Unlike the case of M T 2 , the optimization procedure required to compute the value of M 2 automatically provides an ansatz for both the transverse and the longitudinal components of the invisible momenta. In this sense, once one commits to using M 2 variables instead of M T 2 , the MAOS reconstruction step for finding the longitudinal momentum components is unnecessary.
• The maximally constrained M 2 variable can be expected to provide the best possible ansatz for the individual invisible momenta, since it takes into account all relevant kinematic constraints in a given event topology [44].
The main goal of this paper is to utilize these advantages of the M 2 variables and design an improved algorithm for resolving the combinatorial ambiguity in SUSY-like events with two invisible particles at the LHC. As our benchmark, we shall use the current state of the art algorithm which was proposed and tested for dilepton tt events in Ref. [53]. Correspondingly, in section 2 we shall first give a brief review of the relevant background information regarding the kinematics of the dilepton tt event topology. Then in section 3 we shall carefully define the different options for kinematic reconstruction of the invisible momenta [44]. We shall see that in principle there can be different ways of applying the ideas of MAOS reconstruction, M 2 -assisted reconstruction, or some combination of both. In section 3 we shall also compare the accuracy of several representative methods for invisible momentum reconstruction.
The next three sections will be devoted to the issue of resolving the combinatorial ambiguity. First in section 4 we critically review each of the three steps of the current state of the art method based on the Cambridge M T 2 variable and MAOS-reconstruction of invisible momenta [36,53]. Our goal will be to improve the algorithm in two aspects: • Better performance. By considering various modifications, e.g., utilizing the alternative set of M 2 variables, or alternative implementations of the MAOS method itself, we shall attempt to improve the efficiency 6 of the algorithm in selecting the correct partition in dilepton tt events.
• Simplicity. At the same time, we shall keep an eye on the relative performance of each algorithm component, and if we find components which underperform, we shall eliminate them from consideration, thus simplifying the algorithm. For example, in section 4.2 we shall demonstrate that Step II can be safely disregarded since it is fully correlated with Step I and does not give anything new.
Then in section 5 we consider several new ideas which go beyond the three steps of the current algorithm. In section 5.1 we consider expanding the set of variables used in Step I from two to three, since the dilepton tt event topology allows not just two, but three independent kinematic endpoints [46,57]. Then in section 5.2 we discuss a special class of maximally constrained M 2 variables where the knowledge of the top and W -boson masses can be taken into account already during the optimization stage 7 , thus further improving the ansatz for the transverse invisible momenta. In section 5.3 we study the potential benefit from using a global inclusive variable such as √ŝ or an angular variable such as the scattering angle of the parents in the center-of-mass frame. Finally, in section 6 we treat the general case when the underlying mass spectrum is unknown, and no kinematic endpoint information is available. We consider a simplified version of the algorithm which is suitably adapted to this scenario, and investigate its performance in the general new physics mass parameter space. We discuss future extensions of this work and summarize in section 7.

Dilepton tt kinematics and mass-constraining variables
In this section we shall introduce the basic notation and review the relevant class of massconstraining variables which will be used later to obtain suitable ansatze for the invisible momenta. For the most part, we shall stick to the notation and terminology of Refs. [44,55]. Following [36,53], we focus primarily on the "dilepton tt" event topology depicted in Fig. 1. This choice is motivated by several factors: • As far as the combinatorial problem is concerned, this is the simplest example which is not trivial -if we were to consider a single-step two-body decay on each side, there would be no combinatorial issue to begin with, and if we were to consider longer decay chains, the problem would become more difficult.
• This event topology is realized in the SM production of tt events, providing a useful toy playground for testing new ideas for studying new physics [57][58][59][60].
• Several new physics models can lead to this event topology, including stop-pair production in supersymmetry [61] and pair-production of leptoquarkinos [62].
Thus the general event topology considered in this paper is the pair-production of two identical parent particles A i , followed by a 2-step 2-body decay for each one (see Fig. 1): (2.1) 7 Note that this is impossible in the case of purely transverse variables like MT 2. Figure 1. The event topology considered in this paper, together with the three possible subsystems. The blue dotted, the green dot-dashed, and the black solid boxes indicate the subsystems (a), (b), and (ab), respectively. The figure is taken from Ref. [55].
In principle, A i , B i and C i should be thought of as some unknown BSM particles, while a i and b i are SM particles whose four-momenta are measured. The particles C i are invisible in the detector, and their momenta q i are constrained only by the / P T measurement and their (a priori unknown) masses,m C i , with q 2 i =m 2 C i . The 2-step 2-body event topology of Fig. 1 allows for three different subsystems, as indicated by the colored rectangular boxes [46]. Each subsystem is labelled by the visible particles in it, and defined by a choice of parent and daughter particles, leaving the third type of particles as "relatives": in subsystem (ab) the parents are A i , the daughters are C i and the relatives are B i ; in subsystem (a) the parents are A i , the daughters are B i and the relatives are C i , while in subsystem (b) the parents are B i , the daughters are C i and the relatives are A i . The mass-constraining kinematic variables defined below can be applied to any of the three subsystems, thus each variable has three different versions, depending on the chosen subsystem. For simplicity, in what follows we shall assume that the event topology of Fig. 1 is symmetric, i.e., A 1 = A 2 , B 1 = B 2 , and C 1 = C 2 (see [47,48] for generalizing to the asymmetric case).
We first consider the traditional transverse variable M T 2 [18]. Let the two transverse masses of the parent particles be M T P i ( q iT ,m), where q iT is the transverse momentum of C i andm is a test mass for the daughter particles, which ism C i for the case of subsystems (ab) and (b) andm B i for the case of subsystem (a). The kinematic variable M T 2 is now defined as the absolute minimum of the larger of these two transverse masses, with respect to all possible partitions of the individual invisible transverse momenta q iT , Alternatively, one could apply the same procedure to the actual parent masses, M P i , and define the (3+1)-dimensional analogue of Eq. (2.2) as where now the minimization is performed over the 3-component momentum vectors q 1 and q 2 [8]. As shown in Refs. [8,55,56], at this point the two definitions (2.2) and (2.3) are equivalent, in the sense that the resulting two variables, M T 2 and M 2 , will have the same numerical value. The case when M 2 begins to differ from M T 2 is when we start to apply additional kinematic constraints beyond the / P T condition q 1T + q 2T = / P T . Then the M 2 variable can be further refined and one can obtain non-trivial variations [55]: Here M P i (M R i ) is the reconstructed mass of the parent (relative) particle in the i-th decay chain during the associated minimization procedure and a subscript "C" indicates that an equal mass constraint is applied for the two parents (when "C" is in the first position) or for the relatives (when "C" is in the second position). A subscript "X" simply means that no such constraint is applied. In any given subsystem, the variables (2.2-2.7) are related event-by-event in the following way [55] Until now, we have treated the event topology of Fig. 1 in very general terms. In particular, we have not made any assumptions about the nature of the visible particles a i and b i . If they are all indistinguishable, e.g., jets from gluino pair-production events, pp →gg → jjqq → jjjj + E T / , the resulting combinatorial issues are rather severe, and one should perhaps first focus on testing the hypothesis for the event topology [22]. Here we would like to start with a more tractable problem, where some of the final state particles are distinguishable. Keeping in mind the dilepton tt example and the analogous BSM signatures, we shall take particles a i to be b-jets, and particles b i to be leptons, i.e., a 1 = b, a 2 =b, b 1 = + and b 2 = − , where = {e, µ} and b is the bottom quark. Since the charge of the b-jet is difficult to determine, there is a two-fold partitioning ambiguity: the correct partition is while the wrong partition is In the rest of this paper, we shall be concerned with designing algorithms which would preferentially select the correct pairing (2.9) over the wrong one (2.10). For this purpose, we shall mostly utilize the Cambridge M T 2 variable (2.2) and the constrained M 2CC variable (2.7). Each of these two variables can be applied to one of the three possible tt subsystems, (b ), ( ) and (b). Notice, however, that in the "smaller" subsystems (b) and ( ), the two partitions (2.9) and (2. Recently, Ref. [44] introduced another interesting variation of the M 2CC variable, which takes advantage of the potentially known mass for a relative particle. For example, if the mass of the B i particles is known, we can enforce it as an additional constraint during the minimization in the (ab) subsystem. Specifying to the tt case, where A i are the top quarks t i and B i are the W -bosons W i , we can write where m W is the experimentally measured W -boson mass. Similarly, if we take the mass m t of the top quarks to be known, there is a new variable in the ( ) subsystem:

Reconstruction schemes for invisible momenta
All of the kinematic variables introduced in the previous section are defined in terms of an optimization procedure over all possible values of the individual invisible momenta. The procedure then singles out one particular choice of the invisible momenta, which is used to calculate the corresponding variable. We can also use this choice as a useful ansatz for the invisible momenta, and then apply standard analysis techniques as if the momenta of the invisible particles were known [44,50].
The two main goals of this section are: • to list systematically the different ways in which the variables from the previous section can be used (sometimes in combination) to obtain an ansatz for the invisible momenta (see Table 1); • to compare the accuracy of several representative schemes for invisible momentum reconstruction (see Figs. 2-4).
The ansatz for the invisible momenta is generally obtained in two steps 8 : 1. Fixing the transverse components q iT of the invisible momenta. In principle, there are several possible options here: one can use either an M T 2 variable, or an M 2 variable, which can then be applied to any of the three possible subsystems in Fig. 1. In addition, if one wished to use the mass information for a relative particle, one could also consider the maximally constrained variables (2.11) and (2.12). The four columns of Table 1 Table 1. Various methods for reconstructing the transverse and longitudinal momenta of invisible particles in the dilepton tt event topology of Fig. 1. In all cases, one must specify a test mass for the lightest particle (the neutrino), then superscripts (b ) and ( ) are used to denote respectively the (ab) and (b) subsystems of Fig. 1 (or alternatively, the subsystems (2, 2, 0) and (2, 1, 0) in the notation of Ref. [46]). The methods in the yellow (orange) cells will be investigated in detail in Table 9 (Table 11) below.
imposed on each side of the event. Following the notation of [44], we shall make the distinction between cases where the resonance is a parent particle (MAOS1) and a relative particle (MAOS4). In the classic MAOS reconstruction, the transverse invisible components are obtained from M T 2 , but this can be done for one of several possible subsystems, so we need to implement some notation to indicate which subsystem was used. For example, the abbreviation MAOS1(b ;m t ) in Table 1 implies that the transverse invisible momenta were obtained from M (b ) T 2 , while the longitudinal invisible momenta were computed from the on-shell conditions for the parent particles (thus MAOS1) with mass m t . Similarly, the abbreviation MAOS4( ;m t ) indicates the use of M ( ) T 2 for fixing the transverse invisible momenta, then applying on-shell conditions for the top quarks, which in subsystem ( ) are relative particles (thus the name MAOS4). In both MAOS1 and MAOS4, the longitudinal momenta are obtained up to a four-fold ambiguity, as one has to solve a quadratic equation for each decay side.
• Classic MAOS without mass information (MAOS2 and MAOS3). There are two other MAOS schemes, which are applicable in the absence of any mass information about the parent or relative particles [51,[63][64][65]. In MAOS2 one forces each parent mass to be equal to the computed M T 2 value, i.e., M P i ( q i ) = M T 2 , i = 1, 2, while in MAOS3 one demands that the parent mass be equal to the corresponding transverse parent mass obtained during the M T 2 calculation: Once again, each of these two MAOS schemes can be applied to any of the three possible subsystems [44]. Furthermore, in the previous step 1 we could in principle use a different subsystem for the determination of the transverse invisible components, therefore now we need two subsystem labels to completely define the procedure. We shall employ the notation where the first subsystem label refers to the determination of the transverse invisible momenta, while the second subsystem label refers to the computation of the respective longitudinal components. For example, the abbreviation MAOS3( ;b ) implies that the transverse components were obtained from M ( ) T 2 , and then the longitudinal components were calculated from the MAOS3 condition for the parents in the (b ) subsystem, i.e., the top quarks: The MAOS3 procedure always results in an unique ansatz, while MAOS2 is unique only for balanced events, i.e., events with M T P 1 = M T P 2 ; for unbalanced events, MAOS2 gives exactly two solutions [55].
• M 2 -assisted invisible momentum reconstruction. Another alternative is to use an M 2 variable -recall that the M 2 optimization procedure provides an ansatz for the full 3-vectors q i of the invisible momenta. As indicated in Table 1, such M 2 -Assisted reconstructions will be denoted with M 2 A and will carry a corresponding subsystem label as well.
• Hybrid methods. The remaining methods in Table 1 are hybrid in the sense that they rely on a constrained M 2CC variable for obtaining the transverse components of the invisible momenta and on one of the MAOS methods for the determination of the longitudinal components. We shall call such methods CMAOS for "constrained" MAOS. The rationale for considering these methods is that, as we shall see below, the constrained M 2CC variables often provide superior ansatze for the transverse invisible momenta. Once again, one can "mix and match" the subsystems, which necessitates the use of two subsystem arguments for the CMAOS procedures listed in Table 1.
Note that Table 1 does not include all logical possibilities -for example, in order to keep the table compact, we did not list the the maximally constrained variables (2.11) and (2.12), which represent another M 2 A option for simultaneously computing the transverse and longitudinal invisible components. One should also distinguish between methods which use additional mass inputs (m t or m W ) and methods which do not -in what follows, we shall be careful to compare the performance of those two categories of methods separately. For example, the methods in the yellow-shaded cells of Table 1 require an additional mass input and as such they will be discussed and contrasted in Table 9 of section 4.3. On the other hand, the orange-shaded cells of Table 1 highlight a few representative methods which do not require additional mass inputs -those methods will be compared separately in Table 11 of section 4.3. Note that the M 2 A methods from Table 1 do not use extra mass information.
Having defined the different momentum reconstruction schemes, we are now in position to compare their performance. Following [50,52], we shall first ask, how close each scheme gets to reproducing the actual values for the invisible momenta. Fig. 2 shows a comparison of the true values q true of the transverse components (left panel) and the longitudinal components (middle and right panels) of the invisible momenta to the corresponding reconstructed values q obtained with different methods from Table 1. The left panel in Fig. 2 contains the combined distributions of the transverse momentum differences ∆q x ≡ q x,true −q x and ∆q y ≡ q y,true −q y resulting from four different transverse momentum reconstruction schemes: M 2CW again work best, while among the more conservative methods (right panel), M 2 A appears to outperform MAOS2 and MAOS3 (see also [44]). Fig. 2 by showing the correlations between ∆q z and ∆q x (left panels) and between the difference in magnitudes | q true | − | q | and the direction mismatch ∆R( q true , q ) ≡ (∆η) 2 + (∆ϕ) 2 (right panels). Figs. 3 and 4 reveal that in general, the transverse components of the invisible momenta are reconstructed more accurately than the longitudinal components, and that having ad- ditional mass information at one's disposal definitely helps. The right panels of Fig. 4 also show that for those methods, it is more likely to underestimate (than to overestimate) the magnitude of the invisible momentum -this is easy to understand for the case of events in which the two transverse invisible momenta partially cancel each other out in the / P T sum.

Figs. 3 and 4 provide a more detailed view of the results from
In conclusion of this section, we note that it is known that the performance of the methods with respect to invisible momentum reconstruction can be further improved by selecting only events near the kinematic endpoint of the respective invariant mass variable from which the ansatz originated [44,50]. However, this benefit comes with a significant loss in statistics, and we shall not pursue this idea further here.

Critical review of the standard method
In this section, we analyze the standard method outlined in Refs. [36,53] for resolving the combinatorics problem in dilepton tt events. The method involves three steps, which were briefly reviewed in the Introduction, and will be now examined in detail in the following three subsections. For our numerical studies, we generate a partonic tt dilepton sample with 50k events, using the MadGraph5 aMC@NLO framework at the LHC with √ s = 14 TeV center of mass energy and the default set of parton distribution functions [66]. The masses of the top quark and the W -boson are set to 173 GeV and 80.419 GeV, respectively, and we also take into account the proper finite widths -as we shall see below, this leads to the presence of events for which the top quarks and/or the W -bosons can be significantly off-shell. In order to reduce the background, we apply the same basic cuts as those used in Ref. [53]. This leaves us with 18,456 events after cuts, for a cut efficiency of 37%. The different versions of the M T 2 and M 2 kinematic variables will be computed with the OPTIMASS package [67].

4.1
Step T 2 and m b cuts The first step of the algorithm relies on the fact that in the event topology of Fig. 1, there exist several invariant mass variables, whose distributions exhibit an upper kinematic endpoint. If we choose the correct partition (2.9), all of these endpoints should be satisfied (barring offshell effects). On the other hand, the wrong partition (2.10) may lead to one (or more) endpoint violations. The art of designing a good method for resolving the combinatorics lies in choosing the optimal invariant mass variables which will maximize the number of events for which the wrong partition (2.10) results in endpoint violations.
In principle, there are two types of invariant mass variables which can have kinematic endpoints: • Using visible particles from the same decay chain. One can study the invariant mass of a collection of visible particles emerging from the same decay chain. For a long decay chain, there are many possible combinations [25], but for a short decay chain like the one in Fig. 1, the choice is unique -we can only form the two-body invariant mass of the b-jet and the lepton on each side. This gives us two values, m b + and mb − , each of which should obey the kinematic endpoint m max b , as illustrated in the left panel of Fig. 5. Following Refs. [36,53], we shall apply the stronger condition that the larger of these two values should also obey the upper kinematic endpoint:  • Using visible particles from both decay chains. The other possibility is to use the measured momenta of visible particles from both decay chains in order to construct invariant mass variables which also exhibit upper kinematic endpoints [8]. The prototypical example of such a variable is the Cambridge variable M T 2 (see the middle panel in Fig. 5), but there are other possibilities as well, e.g., M CT [68,69], M CT 2 [70], and more recently, M 2CC [55] (see the right panel in Fig. 5). Following Refs. [36,53], we shall continue to consider M T 2 , but we shall also entertain the possibility of using M 2CC instead. For the correct partition, the distributions of M T 2 and M 2CC have common kinematic endpoints, and so the values of M T 2 and M 2CC obey the hierarchy where the endpoint values correspond to using the true 9 value of the neutrino mass m ν = 0. More importantly, for the wrong partition, the shapes of the M T 2 and M 2CC distributions are different (compare the blue dotted lines in the middle and right panels of Fig. 5), which will affect the efficiency for selecting the correct partition. Due to the general property (2.8), the wrong partition will still preserve the hierarchy M T 2 M 2CC , and therefore, the chances of endpoint violations will be increased if we were to use M 2CC in place of M T 2 [61].

Quadrant
Quadrant for P W for P C  I  II  III IV   I  II  III  IV   Table 2. Resolving the combinatorial ambiguity at Step I. Each event is tagged with two quadrant numbers, one for each partition P i . The quadrant number for the correct (wrong) combination is given in the 'row' ('column') label of the 4 by 4 matrix above. The green, red, and white fields indicate correctly resolved, wrongly resolved, and unresolved cases, respectively.
The distributions in Fig. 5 clearly motivate the use of the variables m b and M 2CC instead) to resolve the two-fold combinatorial ambiguity. 10 This idea is implemented as Step I of the algorithm, by requiring that the invariant mass variables computed with a given partition P i , (i = 1, 2), obey the two kinematic endpoints (4.1) and (4.2). If one of the partitions obeys both endpoints, while the other does not, the former (latter) is declared to be the correct (wrong) partition P C (P W ).
In order to quantify the discussion in the rest of paper, we introduce a simple Cartesian coordinate system designed to keep track of the kinematic endpoint violations (see Fig. 6). The x and y variables will be chosen so that their values are positive (negative) in the absence (presence) of a kinematic endpoint violation. To this end, we shall consider the difference between the value of the upper kinematic endpoint and the value of the variable itselfthis difference is expected to be positive for the correct partition P C , and conversely, if the difference is negative, it is likely that we have chosen the wrong partition P W . Thus in Fig. 6 we choose the x-axis to be m t − M are the invariant masses of the two b-lepton pairs in a given partition. As usual, the plane in Fig. 6 is divided into four quadrants, labelled I, II, III and IV. With this setup, one would expect that the correct partition P C will be registered in the first quadrant I, while the wrong partition P W can end up anywhere, including quadrants II, III and IV, which would indicate some sort of an endpoint violation.
These expectations are confirmed in Fig. 7, which shows scatter plots in the plane of Fig. 6 Figure 7. Scatter plots in the plane of Fig. 6 for the correct partition P C (left panels, red points) and the wrong partition P W (right panels, blue points). In the top row the x-axis is chosen to be for events with the correct partition P C (left panels, red points) and the wrong partition P W (right panels, blue points). We see that the correct partition mostly populates quadrant I, although there is some leakage into the other three quadrants due to off-shell effects. On the other hand, the wrong partition cases are significantly spread out, and the majority of the events live outside quadrant I. The effect is even more pronounced if we trade M 2CC as our x-axis variable (see the plots in the bottom row of Fig. 7). We are now in position to define the action of Step I of the algorithm. For each event, there are two possible partitions, P C from (2.9) and P W from (2.10). Since we do not know which is which, from now on we shall denote them with P k , (k = 1, 2). (It does not matter which partition is labelled first and which is labelled second.) Each partition P k will produce a point in one of the four quadrants within the plane of Fig. 6. We will then resolve the partitioning ambiguity according to Table 3. Whenever one of the two partitions falls in quadrant I while the other does not, then the partition in quadrant I will be declared to be the correct one (P C ). If both partitions fall in the same quadrant, then the event remains "unresolved" at Step I and we need to wait for the next steps of the algorithm. The situation becomes more complicated if both partitions fall outside quadrant I, and within different quadrants. In that case, we shall make the distinction between quadrant III, where both Quadrant Quadrant for P 2 for P 1  I  II  III  IV   I unresolved  Table 3, each event will fall into one of two categories: resolved, for which one of the partitions P k has been declared to be correct, and unresolved, for which no decision has been reached at that stage. Furthermore, the resolved events will not always be identified correctly -on occasion, the algorithm will misidentify the wrong partition P W as being the correct one. In order to better understand the power of each method below, we shall find it convenient to quote our results in the form of 4 × 4 tables like Table 2, where we separately keep track of the quadrant for the correct partition P C (indicated by the row label) and the quadrant for the wrong partition P W (indicated by the column label). Each event will belong to one of the 16 boxes of Table 2, and we will be interested in the number of events N (I,J) within each box, where the "quadrant indices" I (for the correct partition) and J (for the wrong partition) take values in the set {I, II, III, IV }. The action by Table 3 then causes all events within the green-shaded boxes of Table 2 to be correctly resolved, the events within the red-shaded boxes of Table 2 to be wrongly resolved, while the events within the unshaded boxes of Table 2 to remain unresolved. Then, for any given event sample, the total number N C of correctly resolved events will be given by the total number of events within the green-shaded boxes of Table 2: Similarly, the total number of wrongly resolved events N W will be equal to the total number of events within the red-shaded boxes of Table 2: Finally, the total number of unresolved events N U is the sum of all events within the unshaded (white) boxes of Table 2:  The algorithms below will be applied so that once an event is resolved, it does not get reclassified at a later stage, i.e., subsequent steps of the algorithm only affect the remaining  I  II  III  IV   I  6194  910  9793  1020  II  22  31  106  5  III  29  10  191  4  IV  41  3  91  6 Quadrant counts based on M  I  II  III  IV   I  4494 2317 10222  217  II  168  178  480  5  III  37  26  251  1  IV  17  3  38  2   Table 4. unresolved events. Obviously, at different steps of the algorithms, the number of correctly resolved events (N C ), wrongly resolved events (N W ) and unresolved events (N U ) will vary, but those three numbers will always add up to the total number of events N T in the sample: (4.7) In order to compare different algorithms, we define the expected efficiency (sometimes called purity) as For the purposes of calculating the efficiency, we shall assume that any unresolved events are eventually decided with a coin flip (50% efficiency).
Our results for N (I,J) are shown in Table 4, where the quadrants from and m b (right 4 × 4 table). As expected, the most populated entries are found in the first rows, which confirms that in the case of the correct partition P C , endpoint violations are relatively rare 11 . We also find a handful of events in the off-diagonal boxes of the first column -for those events, off-shell effects caused the correct partition P C to violate one or both of the kinematic endpoints, while the wrong partition P W accidentally happened to satisfy both kinematic endpoints. Such events are problematic since they will be wrongly identified -one should keep in mind that only the symmetric combination of events N (I,J) + N (J,I) is experimentally observable, since a priori we do not know which is the correct partition. Nevertheless, we observe that in the cases when one of the partitions ends up in quadrant I while the other does not, the large majority of the events will be correctly identified, since  Another piece of good news is that whenever one of the partitions violates exactly one endpoint, while the other violates both, it is much more likely that the former (latter) is the correct (wrong) partition: Combining the relations (4.9) and (4.10) and using the definitions (4.4) and (4.5), we conclude that N C N W , and therefore Step I of the method works relatively well 12 . With our definition for the efficiency (4.8), the classic method based on M  Table 4, we see that for the standard method, the efficiency is hurt by the relatively large fraction of events which remain unresolved at this stage -the unshaded boxes contain a total of N U = 6430 events, or about 35% of the sample. The situation improves somewhat if we use the M

Step II: The presence of complex solutions
In the second step of the method, one attempts to reconstruct the longitudinal momenta of the invisible particles, by enforcing on-shell conditions for a parent particle (MAOS1) or for a relative particle (MAOS4). Since the on-shell conditions result in quadratic equations, the solutions are not guaranteed to be real. The idea of Step II is to compare the two possible partitions P k in terms of the number of complex solution pairs C for the longitudinal momenta. Since there is a separate calculation for each decay chain, there are three possible outcomes: • C = 0. Both decay chains result in real solutions.
• C = 1. Exactly one decay chain gives a pair of complex solutions, while the other decay chain has real solutions.
• C = 2. Both decay chains result in complex solutions.
For the purposes of applying Step II of the method, there is no need to distinguish between the cases of C = 1 and C = 2, since the important point is simply that C > 0. The action of Step II is the following: if one of the partitions P k gives C = 0, while the other has C > 0, then the former (latter) partition is declared to be the correct (wrong) one. I   II  III  IV  0  1  2  0  1  2  0  1  2  0  1 2  C  I  6194  0  0 Table 6. The same as Table 5, but for the case of MAOS4(b ;m W ), i.e., the transverse momenta of the neutrinos are still obtained from the minimization of M As already discussed in section 3, there are several ways to implement the MAOS idea and compute longitudinal invisible momenta. Tables 5 and 6 show results for the cases of MAOS1(b ;m t ) and MAOS4(b ;m W ), respectively 13 . Similarly to Table 4, each table is  organized  Let us first focus on Table 5, which shows several interesting trends. First, recall the motivation behind Step II -one was hoping to find that the correct partition P C would always give real solutions (C = 0), while the wrong partition P W would always lead to complex solutions (C > 0). Table 5 reveals that this expectation is indeed true, but only for certain quadrant pairs: (I, II), (I, III), (IV, II) and (IV, III). For those cases, Step II would be able to perfectly resolve the combinatorial ambiguity, and it seems that the method works as designed. Unfortunately, three out of these four quadrant pairs are already shaded in green (see Table 2), which means that those events were already perfectly resolved by Step I, thus the additional benefit from Step II for those three quadrant pairs is exactly zero. As for the fourth quadrant pair, (IV, II), it is very sparsely populated, and furthermore, any benefit there would be offset by the negative effects from the symmetric case of (II, IV ), where the results are contrary to our expectations above -now it is the correct partition P C which leads to complex solutions 14 . (The same phenomenon is observed for the other three symmetric pairs as well -see the red-shaded boxes corresponding to (II, I), (III, I), and (III, IV ), where it is again the correct partition P C which has C > 0.) Thus we conclude that for the events which were already resolved at Step I (the greenshaded and red-shaded cells in Table 5), Step II does not bring anything new -its results are either fully correlated with Step I (as for the quadrant pairs (I, II), (I, III) and (IV, III) and their red-shaded symmetric partners on the other side of the diagonal), or inconclusive, since the two partitions behave identically (e.g., for quadrant pairs (I, IV ) and (II, III) and their partners). Therefore, we need to concentrate on the unresolved events in the unshaded cells in Table 5, since those were precisely the events which Step II was meant to address. Unfortunately, we observe that in the unshaded cells along the diagonal in Table 5, the two partitions lead to the same result and cannot be discriminated, while the remaining two cells (II, IV ) and (IV, II) were already discussed above -their statistics is too low, and they tend to cancel each other out, thus they will not appreciably affect the overall efficiency.
Based on the results from Table 5, we conclude that if one were to apply Step II based on the MAOS1(b ;m t ) method, which was used in Ref. [53], there would be no additional benefit beyond Step I, and therefore Step II is unnecessary and can be eliminated. However, this still leaves open the question whether some modified version of Step II can still be useful, e.g., applying a different MAOS scheme like MAOS4(b ;m W ) (see Table 6), or perhaps using one of the CMAOS schemes based on the M 2CC variables (see Tables 7 and 8). But before we discuss these options, it will be useful to understand the results from Table 5 from a physics point of view. A careful inspection of Table 5 reveals that its content can be summarized as follows: for any partition P k , quadrants I and IV produce only real solutions, while quadrants II and III lead to only complex solutions. This means that the existence of complex solutions is correlated with the x-axis variable of Fig. 6 T 2 ), and not with the y-axis variable 14 The fact that the correct partition PC may result in complex momenta should not be surprising -this can be due to finite width and off-shell effects. As a sanity check, we have verified that in the zero-width limit only the first row in Table 5 has any non-zero entries, while the second, third and fourth rows are empty.  Table 7. The same as Table 5, but for the case of CMAOS1(b ;m t ), i.e., the transverse momenta of the neutrinos are now fixed from the minimization of M    Table 8. The same as Table 6, but for the case of CMAOS4(b ;m W ), i.e., the transverse momenta of the neutrinos are now obtained from the minimization of M (m b ). This is easy to understand: m b is formed from visible particle momenta only, and is not directly related to any invisible momenta. Therefore, a violation of the m b kinematic endpoint by itself does not imply unphysical invisible momenta. On the other hand, the physical meaning of M T 2 > m t , then enforcing the on-shell condition for the top quark will necessarily result in unphysical (complex) values for the momenta. In particular, quadrants II and III, in which the M (b ) T 2 endpoint is violated by definition, will always produce complex momenta.
While the above logic helps to understand the results from Table 5, it does not carry over directly to the case of MAOS4(b ;m W ) shown in Table 6, since now we are enforcing an on-shell condition for a different (relative) particle. The trends which we previously observed in Table 5 are still noticeable, but they are not so clear cut. Nevertheless, if we focus on the unresolved events after Step I (the unshaded cells in Table 6), we again see that Step II does not do particularly great on those events. The largest effect is in the (I, I) cell, where the efficiency for resolving the correct partition is 57.4%, which is slightly better than a coin flip. Adding up the results from all previously unresolved cells, we find that if we were to perform Step II with the MAOS4(b ;m W ) version of the method instead of the MAOS1(b ;m t ) option used in Ref. [53], the overall efficiency would increase to 84.5%, which is still worse than the result (85.3%) found in section 4.1 with the improvements in Step I alone, taking advantage of M  Table 4). Tables 7 and 8 show results from two similar exercises where we use the corresponding CMAOS methods, i.e., fixing the transverse components of the invisible momenta from M 2CC somehow into the Step II algorithm would improve the performance. However, Table 7 shows that the improvement in the case of CMAOS1(b ;m t ) is very marginal -the efficiency increases from 85.3% after Step I to 85.4% after Step II. The effect is slightly better in Table 8, which uses the CMAOS4(b ;m W ) option -there the efficiency increases from 85.3% after Step I to 85.9% after Step II. However, even this increase is too small to justify the presence of Step II -as we shall see later on, there exist other, much more effective techniques. Therefore, as a final summary of this subsection, we conclude that Step II can be safely dropped altogether, since its results are largely correlated with Step I.

Step III and possible variations
In this subsection we shall discuss different possible options for the third step of the method and investigate their performance. To recap the situation: when we used M Step I, we ended up with N C = 11, 920 correctly resolved events, N W = 106 incorrectly resolved events, and N U = 6, 430 unresolved events, for an efficiency of 82% (see the top of the middle column in Table 9). If, on the other hand, we choose to use M  Table 9). Then in section 4.2 we showed that Step II does not add much and can be ignored. This brings us to Step III, whose purpose is to further classify the remaining unresolved events after Step I (6,430 and 4,933, respectively) on a statistical basis, using suitable discriminating variables.
We begin by reviewing the method suggested in Ref. [53], which introduced several kinematic variables, T i , i = 1, . . . , 4. These variables were designed so that their values tend to be larger for the case of the wrong partition, i.e.
While it is not guaranteed that (4.11) will be true in every single event, if it holds for the majority of the events, one can attempt to identify the correct partition P C by declaring [53] In the case of several good variables T i , one can generalize (4.12) by choosing the correct partition P C to be the partition P 1 (P 2 ) if the majority of the quantities ∆T i (P 2 , P 1 ) are positive (negative). In the following, we shall refer to this procedure as the "∆T i method" [53].
The T i variables considered in Ref. [53] were the following: (4.14) where P k is one of two partitions and the index j labels the two decay chains in Fig. 1. The variable m reco t (m reco W ) is the reconstructed mass of the top quark (the W -boson) with a MAOS-type method which uses the W -boson mass (the top mass) as an input. Since the longitudinal invisible momenta are obtained from a quadratic equation, in general there are two solutions, labelled by α = ±, corresponding to the two signs in front of the discriminant. Thus in each event one can obtain four reconstructed top quark masses, m reco t (j, α), and four reconstructed W -boson masses, m reco W (j, α). The idea behind the T 3 and T 4 variables in (4.16) and (4.17) is to compare those reconstructed values to the true values m t and m W , respectively. For the correct partition P C , on average one might expect to find the reconstructed values closer to the true ones, in agreement with (4.11).
In Table 9 we test several options for discrimination variables which can be applied at Step III. Our benchmark is the ∆T i method of Ref. [53], which made use of only three variables, T 2 , T 3 and T 4 , since the fourth one, T 1 , was found to be significantly correlated with T 2 . For consistency, whenever the quadrants from Step I are defined in terms of M (right column in Table 9), we shall replace (4.15) with T 2 = M (b ) 2CC . Table 9 reports results from both versions of the ∆T i method -we see that the overall efficiency can be further improved to 87.1% and 89%, respectively. The observed improvement at Step III is due to correctly categorizing (at the rate of about 2:1) the majority of the remaining unresolved events -see Table 10, which gives the breakdown among the individual "unresolved" cases 2CC (right). The resulting cumulative efficiencies are 87.1% and 89%, as shown in Table 9. from Table 3. In spite of this progress, we also notice that a certain number of events (1,412 and 651, correspondingly) still remain unresolved. At first glance, this seems odd, since the ∆T i method uses an odd number of variables, so for each event, there should be a clear winner between the two candidate partitions P 1 and P 2 . However, recall that the longitudinal momentum reconstruction sometimes results in complex solutions, in which case the corresponding variable T 3 or T 4 is undefined. 15 Thus the remaining unresolved events after Step III are those where only two of the three T i variables were calculated, and each preferred a different partition P k . Table 10 demonstrates that Step III was relatively successful. Nevertheless, in the remainder of this section we shall investigate whether further improvements at the level of Step III are still possible. Let us begin by studying the benefit from each individual variable, T 2 , T 3 and T 4 , used in the ∆T i algorithm. Following Ref. [53], in Fig. 8 we show distributions of the "ordered" differences for the three variables T 2 (left panels), T 3 (middle panels) and T 4 (right panels), where we use MC truth information to make sure that we subtract the variables in the order indicated in (4.18). For the plots in the top (bottom) row of Fig. 8, the transverse invisible momenta were obtained with the help of the M 2CC ) variable. Plotting in terms of the ordered difference (4.18) is very useful, since it allows us to see how often the expected relationship (4.11) holds: the difference (4.18) is positive (negative) if (4.11) is satisfied (violated). Thus, by applying the prescription (4.12) for a given variable T i , we shall correctly resolve all events with positive values of ∆T i (P W , P C ) (the shaded portions of the distributions in Fig. 8), and we shall wrongly resolve the events with negative values of ∆T i (P W , P C ) (the unshaded portions of the distributions in Fig. 8). By comparing the areas of the shaded and unshaded 2CC . In the middle and right panels we only plot events with real solutions for the longitudinal momenta. The shaded (unshaded) portions of the histograms represent events which will be correctly (incorrectly) resolved with that particular ∆T i variable alone.
portions of each distribution, we can judge the discrimination power of each variable. For example, the left panels in Fig. 8 show that when considering the whole event sample (red histograms), T 2 appears to be a good variable, since as many as 79% of the events have positive values of ∆T 2 (P W , P C ) [53]. Unfortunately, we find that this conclusion is invalidated after the application of Step I -for the remaining unresolved events after Step I (the blue histograms in Fig. 8), it is actually more likely to find a negative value of ∆T 2 (P W , P C ) instead, thus obtaining the wrong answer for P C . This observation reveals that the results from Step I and Step III are also somewhat correlated -the events which are easy to analyze with T 2 at Step III would already be correctly resolved at Step I. Using ∆T 2 alone (without ∆T 3 or ∆T 4 ) in fact lowers the cumulative efficiency, as shown in Table 9. This motivates us to drop ∆T 2 from further consideration and repeat Step III without ∆T 2 , i.e., with only ∆T 3 and ∆T 4 . As shown in Table 9, this leads to a slight improvement of the cumulative efficiency, to 88.7% and 89.5%, respectively, indicating that T 3 and T 4 retain some discrimination power even after Step I (this can also be deduced from the blue histograms in the middle and right panels of Fig. 8).
Given the large variety of MAOS and CMAOS methods described in section 3 (see Table 1), next we check if there exist alternative versions of the T 3 and T 4 variables which are better suited for our purpose. In the remainder of Table 9 we study the effect on the cumulative efficiency if the reconstruction is performed with one of the eight yellow-shaded methods from Table 1. Taking one variable at a time, we apply Step III as in (4.12) and quote the resulting cumulative efficiency in the last 8 rows of Table 9. The results are rather illuminating, and indicate that "not mixing" the subsystems provides the best option for constructing a useful T i variable. For example, when the transverse invisible components are fixed with the help of the (b ) subsystem, in which the top quark is the parent particle, then we are better off applying the on-shell condition on the same particle in order to reconstruct the longitudinal invisible momenta. Similarly, if the transverse momenta are obtained from the ( ) subsystem, in which the W -boson is the parent particle, then it is preferable to apply the on-shell condition on the W -boson as well. For both MAOS and CMAOS reconstructions, Table 9 shows that the highest efficiencies are obtained in the case of (b , m t ) and ( , m W ) type variables. It is interesting to note that the performance of the modified method above (which used only ∆T 3 and ∆T 4 ) can be matched and even slightly exceeded by using a single additional variable at Step III, provided that we pick the right one: the MAOS1(b ;m t ) option gives 88.8% (compare to 88.7% before), while CMAOS1(b ;m t ) yields 89.6% (compare to 89.5%).
Another interesting point is that in all cases, the cumulative efficiencies in the right column of Table 9 are higher than those in the middle column, thus reinforcing the idea of using the constrained M 2CC variables. In section 4.1 we already established that it is beneficial to redefine Step I in terms of M 2CC variables, and now we see that this advantage is retained after Step III as well.
Finally, one may wonder if there is any additional benefit in combining at Step III two or more of the variables considered in Table 9. We test this idea with the following exercise. Let us revisit Step III, again dropping the T 2 variable (4.15) from consideration, while for the definition of T 3 and T 4 let us choose the two best performing CMAOS variables from the right column of Table 9, namely m reco t with CMAOS1( ;m W ) (87.9%) and m reco W with CMAOS1(b ;m t ) (89.6%). In this modified scheme, we obtain a cumulative efficiency of ε = 89.2% with N C = 2, 483 correctly identified, N W = 1, 023 wrongly identified and N U = 1, 427 unresolved events. Similarly, choosing the two best MAOS options in the middle column of Table 9, namely m reco t with MAOS1( ;m W ) (87.4%) and m reco W with MAOS1(b ;m t ) (88.8%), we find a final efficiency of ε = 88.3% (with N C = 3, 493 correctly identified, N W = 1, 176 wrongly identified and N U = 1, 761 unresolved events). In both exercises, the final efficiency is slightly worse than what would be obtained with the single best variable alone, although the number of unresolved events decreased. These two exercises indicate that there exist nontrivial correlations between the different variables and an improvement of the efficiency is not guaranteed by simply merging or combining different methods. This is one of the reasons why we kept the results for the different methods separate in Table 9.
Up to this point, in Steps II and III we have used reconstruction methods which require  Table 11. The same as Table 9, but for a few representative methods from Table 1 which do not use mass information. the knowledge of a particle mass (m t or m W ). However, when the method is being applied in studies of new physics, such information may not be immediately available. Therefore, it is prudent to consider modifications of Steps II and III, where one uses methods from Table 1 which do not rely on any mass information. To be specific, we focus on the four methods listed in the orange-shaded cells of Table 1 and show the corresponding results in Table 11, which is the direct analogue of Table 9. Once again, it turns out that Step II is unnecessary, albeit for a different reason -this time in all cases the solutions for the invisible momenta are found to be real and therefore invisible momentum reconstruction is always possible for both partitions. It also follows that there will be no unresolved events after Step III, since the relevant kinematic variables can always be computed and compared for the two partitions.
The results in Table 11 help identify the most promising variables for Step III -m reco W with MAOS3(b ;b ) for the middle column (ε = 85.6%) and m reco t with M 2 A( ) for the right column (ε = 88.6%). In both cases, the use of a single variable at Step III leads to an improvement in the efficiency found after Step I of more than 3%.
In conclusion of this section, we summarize our main findings from the review of the method of Refs. [36,53].

The efficiency after
Step I is increased if we define the quadrants of Fig. 6 2.
Step II does not lead to any appreciable effect after Step I, and can be safely omitted from the algorithm.

The use of the variable T 2 in
Step III is counterproductive, thus T 2 can be safely dropped from consideration.

The use of a single optimal variable at
Step III (as opposed to a combination of variables) is generally sufficient to produce the desired result.

A few ideas for further improvement
In the previous section, we considered the partitioning method as defined in Refs. [36,53] and found that with a few slight tweaks the efficiency (4.8) can reach over 89% (88%) with (without) mass information. In this section we shall consider a few more serious departures from the original algorithm, which can potentially further increase the efficiency. Some of the changes are simply quantitative (as in section 5.1, where we increase the number of variables used at Step I), others are qualitative (as in sections 5.2 and 5.3).

Generalizing the quadrant counts
Recall that the main idea at Step I was to use two kinematic variables, in this case m b and M (b ) T 2 , which have clear kinematic endpoints for the case of the correct partition P C . The resulting efficiency after Step I was 82%, and when we replaced M 2CC , the efficiency increased to 85.3%. But one should be able to do even better at Step I. The main point is that in the event topology of Fig. 1 there are not two, but three independent kinematic endpoints (they allow for a complete measurement of the mass spectrum [46,61]). Therefore, one can expect that the addition of a third variable at Step I, i.e., generalizing the plane of Fig. 6 to a three-dimensional parameter space divided into eight octants, would further improve the performance of Step I.
In order to test this idea, we need to pick a suitable third variable to go along with our original two. We focus on the M T 2 and M 2CC variables in the remaining two subsystems, (b) and ( ), and show their distributions in Fig. 9. The plots in the bottom row indicate that the variables M We can now generalize our previous discussion of the quadrant counts in Fig. 6 by populating our events in the three-dimensional space . (5.1) As before, we expect that for the correct partition P C , the events will be populating predominantly the first octant, where (sign(x), sign(y), sign(z)) = (+, +, +), while for the wrong partition P W , the events will be more randomly distributed throughout the eight octants. These expectations are tested in Table 12, which generalizes the right table in Table 4 by additionally incorporating the subsystem variable M ( ) 2CC . In Table 12, each of the eight octants of the parameter space (5.1) is labelled by its corresponding sign signature (sign(x), sign(y), sign(z)), and rows (columns) correspond to the correct (wrong) partition. Just like Table 4 Table 13. Event counts summarizing the number of endpoint violations in Table 12. Unresolved events are in the diagonal entries, while events above (below) the diagonal are correctly (wrongly) resolved. The resulting final efficiency after Step I is 86.8%. Table 12 reveals that the endpoint violations are more likely to occur in the case of the wrong partition. We note that the entries in the (−, +, −) row and column are identically zerothis is due to the fact that the values of m We are now in position to evaluate the effect of Step I in the presence of the additional variable -we simply compare the number of endpoint violations (0, 1, 2 or 3) for each partition P k , and declare the winner P C to be the case with fewer endpoint violations, as illustrated in Table 13. The events for which both partitions give the same number of endpoint violations, remain unresolved -adding up the diagonal (unshaded) cells of Table 13, we find N U = 4, 082. The events above (below) the diagonal cells are correctly (wrongly) resolved, giving N C = 13, 985 and N W = 389, for a final efficiency of 86.8%. 16 This result should be contrasted with the 85.3% efficiency found in the right column of Table 9. Thus the benefit from adding a third variable to Step I can be quantified as an extra 1.5% in the efficiency. 16 We have checked that adding a fourth variable, M   In this section we shall have in mind situations where full mass information is available, such as polarization studies in dilepton top events. Given the known values of m t and m W , the relevant question is whether they have been optimally utilized in the algorithm. With the MAOS method, this mass information is not used at all during the reconstruction of the transverse invisible components. One way to incorporate the mass information from the very beginning is to consider the additionally constrained variables M (b ) 2CW and M ( ) 2Ct defined in (2.11) and (2.12), respectively [44]. We have already seen that those variables show the best performance in terms of reconstructing the neutrino momenta (see Figs. 2 and 3). This motivates us to perform Step I as in the previous section 5.1, but in terms of the alternative parameter space . It should be noted that in certain cases OPTIMASS is unable to find a viable solution, since all constraints cannot be simultaneously satisfied. This can happen, e.g., when we consider the wrong partition of an event, or if some particles are produced off-shell. For the purposes of tabulating the results in Tables 14 and 15, such cases are assigned a "minus" sign. Table 15 shows that when Step I is performed in terms of the alternative variables (5.2), the efficiency of Step I alone is as high as 88.1%. This is comparable to the results with several versions of the full algorithm (including Step III) which were considered in the previous section. Thus we conclude that in cases where the masses of the intermediate particles are known, it is best to perform Step I in terms of the parameters (5.2) which use the variables M In this section, we shall use the fact that once we choose an ansatz for the invisible momenta via one of the methods from Table 1, the full event kinematics is completely fixed as well. This means that when it comes to discriminating the unresolved events after Step I, we are not limited to only invariant mass variables, but we have the full set of kinematics tools at our disposal. In particular, we can study angular variables, as well as global inclusive variables, whose definition does not rely on partitioning the event.
An example of the latter type of variables is the total invariant mass in the event, √ŝ , whereŝ It has been shown that a preselection cut on √ŝ improves the efficiency at the cost of lowering the statistics [53]. This suggests that √ŝ can potentially be a useful variable for categorizing the unresolved events after Step I. The idea is tested in the left panels of Fig. 10. The upper left panel shows √ŝ distributions obtained with the M 2 A(b ) method from Table 1, for the N U = 4, 933 unresolved events arising after Step I when it is done in terms of M (b ) 2CC and m b (see the right columns in Tables 9 and 11). The green dotted (blue dashed) line shows the case of the correct (wrong) partition. For reference, the solid red line gives the true √ŝ distribution. As previously observed in Ref. [16], the reconstructed √ŝ distribution peaks at threshold (2m t ). We also notice that the distribution for the wrong partition P W is slightly harder, which suggests to investigate the ordered difference in analogy to (4.18). The distribution of the variable (5.4) is shown in the lower left panel of Fig. 10. If we attempt to resolve events by applying the condition (4.12) to the variable √ŝ , we find N C = 2, 926 correctly resolved events (the shaded portion of the distribution in the lower left panel of Fig. 10) and N W = 2, 007 wrongly resolved events (the unshaded portion of the distribution). The overall efficiency is then increased from 85.3% after Step I to 87.8% (see Table 16).
An alternative handle to sort the unresolved events is provided by the angular distribution of the parent particles at production. For concreteness, in the upper right panel of Fig. 10 we compare the distributions of the scattering angle θ of the top quarks in their center of mass frame, for the case of the correct partition (green dotted line), the wrong partition (blue dashed line) and the MC truth (solid red line). We notice that in reality, the top quarks are produced predominantly in the forward direction, due to the presence of t-channel and u-channel diagrams. When the momenta are reconstructed using the correct partition, this tendency is retained, while the distribution obtained with the wrong partition is mostly flat in cos θ. This motivates us to consider the corresponding ordered difference ∆(−| cos θ|)(P W , P C ) ≡ (−| cos θ|)(P W ) − (−| cos θ|)(P C ) (5.5) whose distribution is shown in the lower right panel of Fig. 10. As before, events with positive 17 values of the ordered difference (5.5) will be correctly resolved, and they represent the shaded portion of the distribution. As summarized in 2CC and m b . The two partitions are compared based on the resulting values of √ŝ , of (−| cos θ|), or probabilistically based on the templates in Fig. 11. Figure 11. The two-dimensional templates P C (left) and P W (right) in the (cos θ, √ŝ ) plane.
(4.12) to the cos θ variable, we obtain N C = 2, 836 (N W = 2, 097) correctly (wrongly) resolved events and an overall efficiency of 87.3%. Table 16 demonstrates that both variables cos θ and √ŝ are useful in categorizing the unresolved events, if applied separately. We now check whether they can be combined into a single method leading to an even higher efficiency. Since it is not possible to derive analytical expressions for the expected 2D distribution in (cos θ, √ŝ ), we shall use a template method. Fig. 11 shows the relevant two-dimensional distributions using only unresolved events. In the left (right) plot the invisible momenta were reconstructed with the correct (wrong) partition. These two plots define two probability distributions, P C and P W . Each of the two possible partitions P k in an event comes with its own values of cos θ and √ŝ , say cos θ k and √ŝ k . Since we do not know whether P 1 corresponds to P C or P W , we try it both ways, and select As shown in Table 16, the prescription (5.6) results in N C = 3, 199 correctly resolved events and N W = 1, 720 wrongly resolved events, 18 and the overall efficiency is increased to 89.2%, which ranks among the best results we have found so far at the end of Step I. 18 The NU = 14 unresolved events seen in Table 16 are due to our finite binning -for those events, the two points (cos θ k , √ŝ k ), k = 1, 2, happened to fall within the same bin, which resulted in a tie. Since the 6 Finding the correct partition without any mass or endpoint information The potential improvements considered in the previous section relied on some prior knowledge about the masses of the particles involved in the decay chains. However, in new physics applications of our event topology from Fig. 1, such information may be difficult to get at first. This is why in this section we shall consider scenarios where no mass information is available, i.e., the masses of the particles A i , B i and C i are a priori unknown, and furthermore, kinematic endpoint measurements are also unavailable or have very large uncertainties to be useful.
Let us now revisit the method under those assumptions. Steps I and II do require mass information (for defining the quadrants of Fig. 6 and for longitudinal momentum reconstruction, respectively) and therefore cannot be used. Similarly, two of the variables from Step III, namely ∆T 3 and ∆T 4 , also need mass inputs for their calculation. For the moment, this leaves us with only ∆T 1 and ∆T 2 at our disposal. Since for the correct partition the values of T 1 and T 2 are in principle limited from above by a kinematic endpoint, we can still expect (4.11) to be mostly true. In fact, it is known that with the simple prescription (4.12) using ∆T 1 (∆T 2 ) alone, the efficiency is 80% (79%) over all events [53]. As before, we can combine the two variables ∆T 1 and ∆T 2 and assign the correct partition P C to be the one chosen by both variables. In order to estimate the resulting efficiency, in the first row of Table 17 we list the number of events with a given sign signature (sign(∆T 1 (P W , P C )), sign(∆T 2 (P W , P C ))). The events with signature (+, +) (green-shaded cells) will be correctly identified, the events with signature (−, −) (red-shaded cells) will be wrongly identified, while the events with signatures (+, −) or (−, +) (unshaded cells) will remain unresolved at this point. The resulting efficiency of this combined ∆T 1 ⊕ ∆T 2 method is 81.8%.
We emphasize that the ∆T 1 ⊕∆T 2 method does not use any mass information: we simply compare the two possible values for T 1 (P k ) (as well as the two possible values for T 2 (P k )) for k = 1 and k = 2, and choose the larger (the smaller) to indicate the wrong (correct) partition. However, a potential problem with the method is that the variables T 1 and T 2 were found to be correlated [53]. This motivates us to look for an alternative set of variables. Since we saw previously that M The efficiency of the resulting ∆T 1 ⊕ ∆T 5 method is 83.6%, as shown in the second row of Table 17. One can go one step further and add a third variable to the mix, e.g., M The resulting combined method ∆T 1 ⊕ ∆T 5 ⊕ ∆T 6 involves an odd number of variables, thus each event will be resolved based on the sign signature. This is illustrated in Table 18, which templates of Fig. 11 are built from Monte Carlo, in principle one can use more statistics for their generation, and correspondingly smaller bin sizes, which will make such ties increasingly rare.
Until now, we have tested the efficiencies of the methods with a SM sample of dilepton tt events, i.e., the masses of the particles A i , B i and C i were respectively the top mass m t , the W -boson mass m W , and the neutrino mass m ν . One may wonder how sensitive the results are to the choice of a benchmark study point. This issue is investigated in Fig. 12, where we fix the mass of particle A to m A = 500 GeV, and then freely vary the other two masses m B and m C . The left panel in Fig. 12 shows the efficiency of the ∆T 1 ⊕ ∆T 5 ⊕ ∆T 6 method considered above. The efficiency varies noticeably throughout the mass parameter space, and seems to be correlated mostly with m B and less with m C . The highest efficiency is obtained in the region where the spectrum becomes relatively degenerate -in that case the visible decay products a i and b i are highly correlated with the direction of the parent particle A i .
For completeness, in Fig. 12 we also show results for the standard hemisphere method [31][32][33][34] when applied to our event topology. In the hemisphere method, one clusters the visible particles into two groups trying to keep the invariant mass of each cluster to a minimum. It is not difficult to see that in our language this is nothing but the ∆T 1 method. The corresponding efficiency is shown in the middle panel of Fig. 12 and it exhibits the same qualitative behavior. The right panel of Fig. 12 compares the two methods by plotting the fractional difference of their efficiencies. We see that throughout most of the parameter space, the efficiency of the ∆T 1 ⊕ ∆T 5 ⊕ ∆T 6 method is higher by 2-5%.

Summary and outlook
The combinatorial problem is a very important issue in experimental particle physics. Identifying the correct event topology on an event by event basis is a key element of many analyses which attempt to measure particle properties such as spin, couplings, CP quantum numbers, etc. A successful method which can avoid combinatorial ambiguities, especially in jetty events, is bound to improve the sensitivity of new physics searches as well.
In this paper, we revisited some of the existing methods [36,53] for resolving the combinatorial ambiguity in the dilepton tt event topology of T 2 and b) if the quadrants are generalized to "octants" as discussed in section 5.1.

2.
Step II does not lead to any appreciable effect after Step I, and can be safely omitted from the algorithm.

The use of the variable T 2 in
Step III is counterproductive, and T 2 can also be dropped from consideration.

The use of a single optimal variable at
Step III (as opposed to a combination of variables) is generally sufficient to produce the desired result.
5. The efficiency is also increased if the available mass information is incorporated as early as possible, e.g., by utilizing the variables M 6. We investigated further improvements of the algorithm, by invoking other types of variables, including a global inclusive variable like √ŝ and an angular variable like cos θ, see section 5.3. 7. In section 6 we discussed a more general approach which does not rely on any mass information.
One should keep in mind that the efficiency can always be further improved at the cost of statistics. For instance, a cut on √ŝ min reduces the number of signal events, but the resulting efficiency can be increased beyond 90% [53].
Our results are directly applicable to any studies of final states containing bbW + W − . In searches for new physics, dilepton tt would be the dominant background and our results should help in reducing it and increasing the sensitivity. In addition, there are several interesting physics scenarios where a similar combinatorial problem plagues the signal itself: • resonant di-Higgs production in the bbW + W − channel [71]; • direct CP measurement of the Higgs-top coupling [72]; • constraining new resonant physics with top spin polarization information [73]; • triple Higgs boson production [74]; • multi-boson production processes such as W ± W ∓ H or W ± W ∓ HH [75]; • studies of anomalous triple and quartic gauge coupling such as W + W − γ, W + W − Z, γγW + W − and γZW + W − [76][77][78].
Looking ahead, there are several directions in which the study presented here can be evolved.
1. Following the previous literature, in this initial investigation we considered a relatively simple situation, where there were only two possible alternatives, and we had to pick one of them. As we increase the number of indistinguishable objects in the final state, things get much more complicated. We intend to tackle this more difficult problem in the very near future.
2. As the number of jets in the signature is increased, it becomes important to cross-check the parton-level results with more detailed simulations including detector effects, initial and final state radiation, etc.
3. The kinematic variables which we used here were designed for event topologies with 2 missing particles. It would be interesting to generalize our analysis to event topologies with more than 2 missing particles, where one would have to use a different set of M 2 variables suitably adapted for that case.