Interaction networks for the identification of boosted $H\to b\overline{b}$ decays

We develop a jet identification algorithm based on an interaction network, designed to identify high-momentum Higgs bosons decaying to bottom quark-antiquark pairs, distinguish them from ordinary jets originating from the hadronization of quarks and gluons. The algorithm's inputs are features of the reconstructed charged particles in a jet and the secondary vertices associated to them. Describing the jet shower as a combination of particle-to-particle and particle-to-vertex interactions, the model is trained to learn a jet representation on which the classification problem is optimized. The algorithm is trained on simulated samples of accurate LHC collisions, released by the CMS collaboration on the CERN Open Data Portal. The interaction network achieves a drastic improvement in the identification performance with respect to state-of-the-art algorithms.


Introduction
Jets are collimated showers of particles resulting from the hadronization of quarks and gluons produced at particle colliders. Each shower, consisting of quarks and gluons emitted by the primary particle, results in an approximately cone-shaped spray of hadrons, which are then observed in particle detectors. Jet identification, or tagging, algorithms are designed to identify the nature of the primary particle that initiates a shower by studying the collective features of the hadrons inside the jet.
Traditionally, jet tagging was limited to light-flavor quarks (q), gluons (g), or b quarks. At the CERN Large Hadron Collider (LHC), jet tagging becomes a much more complex task, with new jet topologies becoming accessible (see Fig. 1). Due to the large center-of-mass energy available in LHC collisions, heavy particles, such as W, Z, or Higgs (H) bosons or top (t) quarks may be produced with high transverse momentum (p T ). These particles can decay to all-quark final states. Due to the large p T of the original particle, these quarks are produced within a small solid angle. The overlapping showers produced by these quarks may be reconstructed as a single massive jet. As shown in Fig. 1 The identification of jets from heavy resonances relies on jet substructure techniques, designed to highlight the presence of clusters of particles, or prongs, inside the jet. An extensive review of these techniques is provided in Ref. [1]. Additional discrimination is provided by the reconstructed jet mass, usually computed after a jet grooming algorithm. A review of the techniques used to reconstruct jets and their substructure at the LHC experiments can be found in Ref. [2]. The jet mass plays a special role in physics analyses exploiting jet substructure, as described for instance in Ref. [3]. The jet mass distribution is typically used to separate jets from boosted heavy particles, characterized by a peaking distribution, from the smoothly falling background, due to ordinary quark and gluon jets. For certain applications, it is desirable to avoid any distortion of the jet mass distribution when applying a jet-tagging selection.
Due to its lifetime, the presence of a b hadron inside of a jet results in a clean experimental signature: a secondary vertex (SV), displaced from the primary vertex (PV). Modern particle detectors are equipped with a vertex detector and can accurately determine SV positions and their separation from the PV, even in a dense environment like a high-p T jet. This feature is particularly important for tagging a Higgs boson decaying to a bottom quark-antiquark pair (H → bb) because all of the jet constituents originate from two displaced vertices.
Recently, several approaches based on deep learning have been proposed to optimize jet tagging algorithms (see Sec. 2), both using expert features with dense layers or raw data representations (e.g., images or lists of particle properties) with more complex architectures. For instance, the CMS and ATLAS collaborations have investigated the optimal way to combine substructure, tracking, and vertexing information to enhance the tagging efficiency for high-p T H → bb decays [4][5][6][7]. This is an important task in particle physics because measurements of high-p T H → bb decays may help resolve the loop induced and tree-level contributions to the gluon fusion process [8,9] and provide an alternative approach to study the top quark Yukawa coupling in addition to the ttH process [10,11].
In this work, we propose to identify H → bb jets with an interaction network (IN). In Ref. [12], INs were introduced to describe complex physical systems and predict their evolution after a certain amount of time. This was achieved by constructing graph networks to learn the interactions between the physical objects, represented as the nodes of the graph. Although there is no direct analogy between a set of physical objects evolving in time and the particle constituents of a jet, we showed in Ref. [13] that the IN architecture outperforms other deep neural networks (dense, convolutional, and recurrent networks) for a jet-substructure classification task. In this paper, we extend this result to the case of H → bb tagging. In particular, we investigate the use of INs to learn a collective representation of the tracking, vertexing, and substructure properties of the jet and employ this optimized representation to enhance the tagging efficiency. By placing charged particles and secondary vertices on a graph, the network can learn a representation of each particle-to-particle and particle-tovertex interaction, and exploit this information to categorize a given jet as signal (H → bb) or background (QCD).
The study is carried out using a sample of fully-simulated LHC collision events, released by the CMS collaboration on the CERN Open Data portal [14]. Previously, many machine learning studies were limited to studies based on generator-level physics with simple detector emulation. The released CMS full-simulation samples allow for a more in depth and realistic study of the efficacy of machine learning methods on high-energy physics experiments. We compare the performance to a state-of-the-art H → bb tagging algorithm in CMS, the deep double-b (DDB) tagger [5].
The IN tagger only relies on information related to charged particles, which (unlike neutral particles) can be tracked back to their point of origin: the PV of the high-p T collision, any SV generated in the collision, or additional PVs originated by simultaneous proton-proton collisions (pileup). This choice makes the algorithm particularly robust against the large pileup contamination expected in future LHC runs since this contamination can be removed via so-called charge hadron subtraction (CHS) [15]. On the other hand, we consider an extended representation of each charged particle, with 22 additional features with respect to the DDB tagger (as discussed in Section 3). As a result of this, we obtain a sizable improvement in tagging performance despite ignoring neutral particles. This paper is structured as follows: we discuss related works in section 2. Section 3 gives a brief description of the data sets used. Sections 4 and 5 describe the IN architecture and the algorithms used to decorrelate its score from the jet mass distribution. Section 6 describes the baseline DDB algorithm, respectively. Results are presented in section 7. Conclusions are given in section 8.

Related work
Deep learning has recently found a great deal of success in particle physics [1,16]. Deep Neural Networks (DNNs) are artificial neural networks with multiple feed-forward hidden layers, each of which takes input features and produces a more abstract and composite representation as an output. Networks of this kind have produced progresses in many fields, including computer vision and natural language processing. Driving the innovation in these fields are increasingly complex architectures that are well-suited to a particular domain, including convolutional neural networks (CNNs) [17][18][19], recurrent neural networks (RNNs) [20,21], long short-term memory units (LSTMs) [22], and gated recurrent units (GRUs) [23]. Jet tagging is one of the most popular LHC-related tasks to which DL solutions have been applied. Several classification algorithms have been studied in the context of jet tagging at the LHC [24][25][26][27][28][29] using CNNs, or physics-inspired DNN models [30][31][32]. Recurrent and recursive layers have been used to define jet classifiers starting from a list of reconstructed particle momenta [33][34][35]. Recently, these different approaches, applied to the specific case of top quark (t) jet identification have been compared [36]. Unsupervised methods have also been proposed, mainly to tag top jets or jets coming from new postulated new particles [37][38][39].
Graph networks have very recently been used for jet tagging, matching the performances of other deep learning approaches [40,41,13], for event classification [42,43], charged particle tracking in a silicon detector [44], pileup subtraction at the LHC [45], and particle reconstruction in irregular calorimeters [46,13,41] as well as in the IceCube experiment [43].
Particles, distributed sensor networks and power grids are examples of problems that involve multiple entities with complex interactions. Graphs provide a natural representation for encoding such relational information. Traditional machine learning methods use feature engineering to learn from graphs, which is slow and costly. Graph representation learning, including graph convolution networks [47][48][49][50] and graph generative models [51,52], leverages deep learning graph representation to learn directly from structured data. In contrast to existing deep learning methods, graph representation learning can (1) handle irregular grids with non-Euclidean geometry [53], (2) encode physics knowledge via graph construction [54], and (3) introduce relational inductive bias into data-driven learning systems [55]. Convolutional neural networks are powerful classifiers that work extremely well for images [56,57], where data are represented on a pixel grid. However, in many scientific problems the data itself is not Euclidean. Geometric deep learning algorithms, such as graph neural networks [58,59], that are invariant to the underlying grid structure, emerge as a more optimal choice for such data.

Data samples
The CMS Open Data are available from the CERN Open Data Portal [14], including releases of 2010, 2011, and 2012 CMS collision data as well as 2011, 2012, and 2016 CMS simulated data.
Samples of H → bb jets are available from simulated events containing Randall-Sundrum gravitons [60] decaying to two Higgs bosons, which subsequently decay to bb pairs. The event generation was done with MADGRAPH5_aMCATNLO 2.2.2 at leading order, with graviton masses ranging between 0.6 and 4.5 TeV. The main source of background originates from multijet events. The background data set was generated with pythia 8.205 [61] in different bins of the average p T of the final-state partons (p T ). The parton showering and hadronization was performed with pythia 8.205 [61], using the CMS underlying event tune CUETP8M1 [62] and the NNPDF 2.3 [63] parton distribution functions. Pileup interactions are modelled by overlaying each simulated event with additional minimum bias collisions, also generated with pythia 8.205.
The outcome of the default CMS reconstruction workflow is provided in the Open Data release [64]. In particular, particle candidates are reconstructed using the particle-flow (PF) algorithm [65]. Charged particles from pileup interactions are removed using the CHS algorithm. Jets are clustered from the remaining reconstructed particles using the anti-k T algorithm [66,67] with a jet-size parameter R = 0.8 (AK8 jets). The standard CMS jet energy corrections are applied to the jets. In order to remove soft, wide-angle radiation from the jet, the soft-drop (SD) algorithm [68,8] is applied, with angular exponent β = 0, soft cutoff threshold z cut < 0.1, and characteristic radius R 0 = 0.8 [69]. The soft-drop mass (m SD ) is then computed from the four-momenta of the remaining constituents.
A signal H → bb jet is defined as a jet geometrically matched to the generator-level Higgs boso and both b quark daughters. Jets from QCD multijet events are used to define a sample of fake H → bb candidates.
The data set is reduced by requiring the AK8 jets to have 300 < p T < 2400 GeV, |η| < 2.4, and 40 < m SD < 200 GeV. Charged particles are required to have p T > 0.95 GeV and reconstructed secondary vertices (SVs) are associated with the AK8 jet using ∆R = ∆φ 2 + ∆η 2 < 0.8. The data set is divided in blocks of features, referring to different objects. Different blocks are used as input by the models described in the rest of the paper.
The IN uses 30 features related to charged particles (see Tab. 2 in App. B). The IN also uses 14 SV features listed in Tab. 3. The DDB tagger [5] uses a subset of the above features (8 features for each particle and 2 features for each SV), chosen to minimize the correlation with the jet mass. In addition, the DDB tagger uses 27 high-level features (HLF) listed in Tab. 4 and first used in a previous version of the algorithm, described in Ref. [4]. For both the IN and the DDB tagger, charged particles (SVs) are sorted in descending order of the 2D impact parameter significance (2D flight distance significance) and only the first 60 (5) are considered.

The interaction network model
The IN is based on two input collections comprising N p particles, each represented by a feature vector of length P , and N v vertices, each represented by a feature vector of length S. The input consists of an ensemble of X and Y matrices, with sizes P × N p and P × N v , respectively.
A particle graph G p is constructed by connecting each particle to each other particle through N pp = N p (N p − 1) directed edges. Similarly, a particle-vertex graph G pv is constructed by connecting each particle to each vertex through N pv = N p N v undirected edges. This is pictorially represented in Fig. 2 for the case of a three particles and two vertices. As shown in the figure, the graph nodes and edges are arbitrary enumerated. The result of the graph processing is independent of the labeling order, as described below.
For the graph G p , a receiving matrix (R R ) and a sending matrix (R S ) are defined, both of size N p × N pp . The element (R R ) ij is set to 1 when the i th particle receives the j th edge and is 0 otherwise. Similarly, the element (R S ) ij is set to 1 when the i th particle sends the j th edge and is 0 otherwise. For the second graph, the corresponding adjacency matrices In the example of Fig. 2, the R R , R S , R K , and R V matrices would be written as: The data flow of our IN model is pictorially represented in Fig. 3. The input processing starts by creating the 2P × N pp particle-particle interaction matrix B pp and the (P + S) × N vp particle-vertex interaction matrix B vp defined as: where · indicates the ordinary matrix product. Each column of B pp consists of the 2P features of the sending and receiving nodes of each particle-particle interaction, while each column of B vp consists of the P + S features of each particle-vertex one.
Processing each column of B pp by the function f pp R , one builds an internal representation of the particle-particle interaction with a function f pp We then propagate the particle-particle interactions back to the particles receiving them, by building which collects the information of the particle-vertex interactions for each particle and across all of the vertices.
The next step consists of building the C matrix, with dimensions (P + 2D E ) × N O , by combining the input information for each particle (X) with the learned representation of the particle-particle (E pp ) and particle-vertex (E vp ) interactions: The final aggregator combines the input and interaction information to build the postinteraction representation of the graph, summarized by the matrix O, with dimensions As is done for f pp R and f vp R , f O is applied to each column of C. We stress the fact that the by-column processing applied by the f pp R , f vp R , and f O functions and the sum across interactions by defining the E pp and E vp matrices are essential ingredients to make the outcome of the IN tagger independent of the order used to label the N p input particles and N v input vertices. In other words, while the representations of the R R , R S , R K , and R V matrices depend on the adopted labeling convention, the final representation of each particle does not.
The learned representation of the post-interaction graph, represented by the elements of the O matrix, can be used to solve the specific task at hand. Depending on the task, the final function that computes the classifier output may be chosen to preserve the permutation invariance of the input particles and vertices. In this case, we first sum along each row (corresponding to a sum over particles) of O to produce a feature vector O with length D O for the jet as a whole. This is passed to a function φ C : R D O → R N , which produces the output of the classifier.
The training of the IN is performed with the CMS open data simulation with 2016 conditions. The input data set consists of 3.9 million H → bb jets and 1.9 million inclusive QCD jets, split into training, validation, and test samples with proportions of 80%, 10%, and 10%, respectively.
We use PyTorch [70] to implement and train the classifier on one GTX 1080 GPU 1 . The model is implemented with each of f pp R and f vp R expressed as a sequence of 3 dense layers of sizes (60,30,20) with a ReLU activation function after each layer. The function f O is a dense sequence of sizes (60,30,24) in a similar fashion. We use up to N p = 60 charged Figure 3: Illustration of the IN classifier. The particle feature matrix X is multiplied by the receiving and sending matrices R R and R S to build the particle-particle interaction feature matrix B pp . Similarly, the particle feature matrix X and the vertex feature matrix Y are multiplied by the adjacency matrices R K and R V , respectively, to build the particle-vertex interaction feature matrix B vp . These pairs are then processed by the interaction functions f pp R and f vp R , and the post-interaction function f O , which are expressed as neural networks and learned in the training process. This procedure creates a learned representation of each particle's post-interaction features, given by N p vectors of size D O . The N p vectors are summed, giving D o features for the entire jet, which is given as input to a classifier φ C , also represented by a neural network. More details on the various steps are given in the text. particles and N v = 5 secondary vertices as inputs to the IN tagger. We train the model using the Adam optimizer [71] with an initial learning rate of 10 −4 and a minibatch size of 128 for up to 100 epochs, enforcing early stopping [72] on the validation loss with a patience of 5 epochs.
As a baseline, we minimize the categorical cross-entropy loss for the classification task L C and we let the network exploit all the discriminating information in the data set.

Decorrelation with the jet mass
Many possible applications of a jet tagging algorithm would require the final score to be uncorrelated from the jet mass, so that a selection based on the tagger score does not change the jet mass distribution. This is particularly relevant for the background distribution, but is required to some extent also for the signal one. Several techniques exist to deliver a tagger with minimal effects on the jet mass distribution. For taggers based on high-level features, one could remove those features more correlated to the jet mass or divide those correlated features by the jet mass. For taggers based on a more raw representation of the jet (as in our case), one could perform an adversarial training [73][74][75]. One could also reweight or remove background events such that the background m SD distribution is indistinguishable from the signal m SD distribution. Finally, one could also define a mass-dependent threshold based on simulation as in the "designing decorrelated taggers" (DDT) procedure proposed in Ref. [76]. We present results for the latter three approaches.

Adversarial training
A secondary adversary network is constructed that consists of three hidden layers each with 64 nodes. The adversary is trained simultaneously with the classifier (interaction network) using the summed post-interaction feature vector O as its input. From this input, the adversary is trained to predict a one-hot encoding of the pivot feature m SD , which we aim to decorrelate from the classifier output. The chosen one-hot encoding corresponds to 40 m SD bins from 40 to 200 GeV. The training begins from by initializing the weights from the best classifier training. The adversary is then pre-trained for 10 epochs using the Adam algorithm with an initial learning rate of 10 −4 . During each epoch, the classifier is first trained by minimizing the total loss Subsequently, the adversary is trained by minimizing L adversary using only the background QCD samples. To balance tagging performance and m SD correlation, λ = 10 was chosen.

Sample reweighting
While adversarial training requires a complicate tuning process, sample reweighting is a simpler way to achieve the same goal. Individual QCD events are weighted in the loss function based on their mass bin as to match the signal jet mass distribution of the training sample. Given a background event in certain mass bin, with the number of background and signal events in that bin denoted as N bin bkg and N bin sig , respectively, the event is weighted by w bin = N bin sig /N bin bkg .

Designing decorrelated taggers
Following the DDT procedure [76], the tagger threshold for a given FPR or "working point" is determined as a function of m SD . By creating a m SD -dependent tagger threshold, the background jet m SD distribution for events passing and failing this threshold can be made identical. In practice, this is done considering the distribution of the network score vs. the jet p T and m SD for the training dataset. A quantile regression was used to find the threshold on the network score as a function of p T and m SD distribution that would correspond to a fixed quantile (the chosen 1-FPR value). By construction, this procedure results in near-perfect mass-decorrelation.
In our case, a gradient boosted regressor [77,78] with the following parameters was used: • α-quantile = 1 − FPR, • number of estimators of 250, • minimum number of samples at a leaf node of 3, • minimum number of samples to split an internal node of 3, • maximum depth of 5, • validation set of 20%, • early stopping with tolerance = 5.

Deep double-b tagger model
The DDB tagger is the deep neural network algorithm currently in use by the CMS collaboration to identify H → bb jets. Since this tagger is trained on a dataset similar to the one considered for this study, we adopt it as a proxy for a typical state-of-the-art algorithm. The DDB model is based on the 27 HLFs used in Ref. [4], as well as 8 particle-specific features of up to 60 charged particles, and 2 properties of up to 5 SVs associated with the jet (see App. B). Each block of inputs is treated as a one-dimensional list, with batch normalization [79] applied directly to the input layers. For each collection of charged particles and SVs, separate 1D convolutional layers [80], with a kernel size of 1, are trained: 2 hidden layers with 32 filters each and ReLU [81] activation. The filters act on each particle or vertex individually. The compressed and transformed outputs are then separately fed into two gated recurrent units (GRUs) with 50 output nodes each and ReLU activation. The outputs of the GRUs are concatenated with the HLFs and then processed by a dense layer with 100 nodes and ReLU activation, and another final dense layer with 2 output nodes with softmax activation. Dropout [82] (with a rate of 10%) is used in each layer to prevent overfitting.
The data set used for training consists of CMS simulation of H → bb and corresponding to 2017 data-taking conditions. The training was performed with Keras [83] over 100 epochs with a batch size of 4096 using the Adam optimizer [71]. In order to decorrelate the tagger output from the jet mass, the network was trained for an additional 20 epochs, with a custom loss function penalty term which penalizes the mass sculpting. Namely, the Kullback-Liebler (KL) divergence, is used between two m SD distributions: one weighted by the network's output probability for signal and the other weighted by the network's output probability for background. The KL divergence is computed for the distributions for true signal events P sig (m SD ) and true background events P B (m SD ), separately. The total loss function is then where λ = 2 was chosen.   Figure 5: An illustration of the "sculpting" of the background jet mass distribution (left) and the signal jet mass distribution (right) after applying a threshold on the tagger score corresponding to a 1% false positive rate for several different algorithms. The unmodified interaction network is highly correlated with the jet mass, but after applying the methods described in the text, the correlation is reduced for the background while the peak of the signal distribution is still retained.
The DDB model was trained using CMS simulated events with 2017 running conditions. We verified that a similar tagging performance is obtained when the training procedure is repeated on the 2016 data set considered for this study.

Results
As shown in Fig. 4, the IN provides an improved performance with respect to the DDB tagger. At 1% FPR, the IN tagger outperforms the DDB tagger by 40% in true positive rate (TPR). Likewise, at 80% TPR, the IN tagger yields a factor of 4 smaller false positive rate (FPR) than the the DDB tagger. Fig. 5 shows an illustration of how the signal and background jet mass distributions change after applying a threshold on the tagger score for different decorrelation procedures. Following Ref. [84], we quantify the impacts of these algorithms on the mass decorrelation by computing the Jensen-Shannon (JS) divergence: where M = 1 2 (P + Q) is the average of the normalized m SD distributions of the background jets passing (P ) and failing (Q) a given tagger score. As shown in Fig. 6, the DDT procedure provides the best decorrelation of the IN tagger followed by the reweighted training and the adversarial training, respectively.
After applying the mass decorrelation techniques, the performance worsens slightly but still significantly outperforms the DDB taggers, as shown in Fig. 4. At 1% FPR, the DDT-decorrelated IN tagger has a TPR of 76% compared to the decorrelated DDB tagger with a 48% TPR, corresponding to an improvement of 55%. Table 1 summarizes different performance metrics for the four considered models.
In addition, we show the performance of the proposed algorithm as a function of the number of primary vertices in the event (see Fig. 7), scaling linearly with the number pileup. Using only charged particles and secondary vertices as input, the IN is also robust against an increasing number of pileup interactions.

Conclusions
We presented a novel jet-tagging technique using a graph representation of the jet's constituents and secondary vertices based on an interaction network to identify H → bb jets in LHC collisions. This model can operate on a variable number of jet constituents and secondary vertices and does not depend on the ordering schemes of these objects. The interaction network was trained on an a simulation dataset released by the CMS collaboration in the CERN Open Data Portal. A significantly performance improvement is observed with respect to the corresponding Deep Neural Network currently used in the CMS collaboration (the DDB tagger). By design, our interaction network offers a more flexible representations   of jet data and a robustness against the noise generated by pileup collisions. The algorithm implementation and its training code are available at Ref. [85].
Together with the best trained model, we presented additional models, obtained by applying different decorrelation techniques between the network score and the jet-mass distribution. This was done to minimize the selection bias of the classifier output towards any values of the jet mass, which would make this algorithm suitable for physics analyses relying on the jet mass as a discrimination variable. As expected, the three decorrelation procedures result in a reduction of the H → bb identification performance. Nevertheless, the three decorrelated models outperform the best DDB referenced model.
Once applied to a full data analysis, our tagging algorithm could contribute a substantial improvement to the experimental precision. Our results motivate further exploration of applications based on interaction networks (and graph neural networks in general) for object tagging and other similar tasks in experimental high energy physics.

A Model implemented in TensorFlow
In order to integrate the interaction network algorithm into experimental workflows, it is often necessary to provide the algorithm was converted to p T of the charged particle divided by the p T of the AK8 jet track_erel Energy of the charged particle divided by the energy of the AK8 jet track_phirel ∆φ between the charged particle and the AK8 jet axis track_etarel ∆η between the charged particle and the AK8 jet axis track_deltaR ∆R between the charged particle and the AK8 jet axis track_drminsv ∆R between the associated SVs and the charged particle track_drsubjet1 ∆R between the charged particle and the first soft drop subjet track_drsubjet2 ∆R between the charged particle and the second soft drop subjet track_dz Longitudinal impact parameter of the track, defined as the distance of closest approach of the track trajectory to the PV projected on to the z direction track_dzsig Longitudinal

Variable
Description sv_ptrel p T of the SV divided by the p T of the AK8 jet sv_erel Energy of the SV divided by the energy of the AK8 jet sv_phirel ∆φ between the SV and the AK8 jet axis sv_etarel ∆η between the SV and the AK8 jet axis sv_deltaR ∆R between the SV and the AK8 jet axis sv_pt p T of the SV sv_mass Mass of the SV sv_ntracks Number of tracks associated with the SV sv_normchi2 Normalized χ 2 of the SV fit sv_costhetasvpv cos θ between the SV and the PV sv_dxy Transverse (2D) flight distance of the SV sv_dxysig Transverse (2D) flight distance significance of the SV sv_d3d 3D flight distance of the SV sv_d3dsig 3D flight distance significance of the SV Table 3: Secondary vertex features. The interaction network uses all of the features, while DDB algorithm uses the subset of features indicated in bold.

Description fj_jetNTracks
Number of tracks associated with the AK8 jet fj_nSV Number of SVs associated with the AK8 jet (∆R < 0.7) fj_tau0_trackEtaRel_0 Smallest track ∆η relative to the jet axis, associated to the first N-subjettiness axis fj_tau0_trackEtaRel_1 Second smallest track ∆η relative to the jet axis, associated to the first N-subjettiness axis fj_tau0_trackEtaRel_2 Third smallest track ∆η relative to the jet axis, associated to the first N-subjettiness axis fj_tau1_trackEtaRel_0 Smallest track ∆η relative to the jet axis, associated to the second N-subjettiness axis fj_tau1_trackEtaRel_1 Second smallest track ∆η relative to the jet axis, associated to the second N-subjettiness axis fj_tau1_trackEtaRel_2 Third smallest track ∆η relative to the jet axis, associated to the second N-subjettiness axis fj_tau_flightDistance2dSig_0 Transverse (2D) flight distance significance between the PV and the SV with the smallest uncertainty on the 3D flight distance associated to the first N-subjettiness axis fj_tau_flightDistance2dSig_1 Transverse (2D) flight distance significance between the PV and the SV with the smallest uncertainty on the 3D flight distance associated to the second N-subjettiness axis fj_tau_vertexDeltaR_0 ∆R between the first N-subjettiness axis and SV direction fj_tau_vertexEnergyRatio_0 SV energy ratio for the first N-subjettiness axis, defined as the total energy of all SVs associated with the first N-subjettiness axis divided by the total energy of all the tracks associated with the AK8 jet that are consistent with the PV fj_tau_vertexEnergyRatio_1 SV energy ratio for the second N-subjettiness axis fj_tau_vertexMass_0 SV mass for the first N-subjettiness axis, defined as the invariant mass of all tracks from SVs associated with the first N-subjettiness axis fj_tau_vertexMass_1 SV mass for the second N-subjettiness axis fj_trackSip2dSigAboveBottom_0 Track 2D signed impact parameter significance of the first track lifting the combined invariant mass of the tracks above the b hadron threshold mass (5.2 GeV) fj_trackSip2dSigAboveBottom_1 Track 2D signed impact parameter significance of the second track lifting the combined invariant mass of the tracks above the b hadron threshold mass (5.2 GeV) fj_trackSip2dSigAboveCharm_0 Track 2D signed impact parameter significance of the first track lifting the combined invariant mass of the tracks above the c hadron threshold mass (1.5 GeV) fj_trackSipdSig_0 Largest track 3D signed impact parameter significance fj_trackSipdSig_1 Second largest track 3D signed impact parameter significance fj_trackSipdSig_2 Third largest track 3D signed impact parameter significance fj_trackSipdSig_3 Fourth largest track 3D signed impact parameter significance fj_trackSipdSig_0_0 Largest track 3D signed impact parameter significance associated to the first N-subjettiness axis fj_trackSipdSig_0_1 Second largest track 3D signed impact parameter significance associated to the first N-subjettiness axis fj_trackSipdSig_1_0 Largest track 3D signed impact parameter significance associated to the second N-subjettiness axis fj_trackSipdSig_1_1 Second largest track 3D signed impact parameter significance associated to the second N-subjettiness axis fj_z_ratio z ratio variable as defined in Ref. [4]