Quark jet rates and quark/gluon discrimination in multi-jet final states

We estimate the number of quark jets in QCD multi-jet final states at hadron colliders. In the estimation, we develop the calculation of jet rates into that of quark jet rates. From the calculation, we estimate the improvement on the signal-to-background ratio for a signal semi-analytically by applying quark/gluon discrimination, where the signal predicts many quark jets. We introduce a variable related to jet flavors in multi-jet final states and propose a data-driven method using the variable to reduce systematic uncertainties of analysis results. As the same with the semi-analytical result, the improvements on the signal-to-background ratio using the variable in Monte-Carlo analysis are estimated.


Introduction
So far we have not caught a clear sign of physics beyond the standard model at the LHC. We should maximize the discoverability of new physics at the LHC by using information of final states more precisely. In conventional analysis, we categorize events by inclusive variables like the number of jets, the scalar sum of the transverse momentum of jets and so on, then find signal regions using exclusive variables like the transverse momentum of objects, the distance between objects etc. We are able to access more specific features of events using jet substructure, and the related studies have developed dramatically in this ten years [1][2][3][4][5][6].
One of the differences between the boosted jet tagging technique and the quark/gluon discrimination is the size of the jet radius used in the analysis. The multi-prong structure of boosted jets are formed by decays stem from the electroweak interaction, and a large jet radius is basically required to catch most of the decay products. The QCD jets have 1-prong structures such that there is a core parton carrying on most of the energy of jet, and the core is dressed in soft-gluons radiated from itself. The main difference between quark jets and gluon jets stems from the difference of color factors for the gluon radiation. Gluon jets emit more partons and wider radiations due to the difference. Neglecting the logarithmic scaling on the strong coupling and masses of the active quarks, the QCD radiation is the scale-invariant. That is, if one zooms in on a QCD jet, one will find a repeated self-similar pattern of jets within jets within jets, reminiscent of fractals. The difference exists even in a neighbourhood of the jet core, therefore the quark/gluon discrimination works out even if the jet radius is small.
Due to the properties, the quark/gluon discrimination is maximally utilized in multijet final states. In the case that a signal has more n quark jets compared to backgrounds, we naively expect the signal-to-background ratio increases ( q / g ) n times using the quark/gluon tagging, where q and g are the quark and gluon jet efficiencies and q / g > 1 in the assumption such that quarks come from a signal.
Many new physics models predict multi-jet final states. For example, heavy colored resonances in each model, like gluino or squarks in SUSY, could emit many partons via their cascade decays, and there are several studies for these at the LHC [73][74][75][76][77][78][79]]. Another example is searches of low-scale gravity which deals with the hierarchy problem concerned with the difference between the electroweak and Plack scale. The models can predict microscopic black holes or highly excited string states at TeV scale. The objects predict a large number of energetic particles, which are mostly quark and gluons, and are constrained experimentally [81][82][83][84][85]. Moreover, the multi-jet final state is a good probe for the higher dimensional operators which are caused by new color interactions at high energy scale. There are two type of dimension-6 pure QCD operators, g s f abc G µ aν G ν bλ G λ cµ and g 2 s (qγ µ T a q)(qγ µ T a q), and especially the triple gluon field strength gets large enhancement at high energy and large jet multiplicity regions [86,87]. The operator also predicts a specific quark/gluon jet fraction such that G 3 -operator forms leading and sub-leading gluon jets although the leading and sub-leading jets tend to be valence quark jets in the standard model backgrounds.
We develop the calculation of jet rates into quark jet rates, and estimate quark/gluon jet fractions in the QCD multi-jet final states. Also, we consider a data-driven analysis for new physics searches in multi-jet final states. In the analysis, we introduce a variable defined in events having jets more than n, where Q i (> 0) is assigned to i-th jet and it becomes larger when the jet substructure seems to be a quark jet. So, d gets larger if events have more quark jets. In conventional analysis for multi-jet final states, we fit a distribution of an inclusive variable like the scalar sum of the transverse momenta H T in a control region, and predict the number of background in a signal region using the fit functions. We also fit remaining rates of the number of events after imposing d-cut for each H T bins as will be shown in Fig. 7 and show a big improvement from conventional analysis by using the new information.
This paper is organized as follows. In Section 2, we calculate quark jet rates at hadron colliders based on the generating functional method. In Section 3, we estimate how many quark jets are contained in the QCD multi-jet background using the formulae derived in the last section. Improvements of the signal-to-background ratio for the analysis in multi-jet final states by using the quark/gluon discrimination are also estimated semi-analytically. In Section 4, the improvements using the variable d are estimated in Monte-Carlo analysis. In Section 5, we summarize our results and reach out to a conclusion.

Quark jet rates in multi-jet final states
We first estimate how many quark jets are contained in the QCD multi-jet background at hadron colliders. The estimation is useful to know the impact of quark/gluon discrimination for new physics searches and helps to understand the results of the analysis. Assuming infinite calculation resources, we can add any number of additional partons into parton showers using the matching schemes [88][89][90][91][92] in the simulation of multi-jet final states. In this calculation, we use the generating functional method [93][94][95][96] based on DGLAP equations, and all of leading logarithmic (LL) terms and a part of nextto-leading logarithmic (NLL) terms are taken into account. Matrix element corrections for additional partons are absence in the calculations and those effects are examined in Appendix A.2.

Generating functionals
We introduce generating functionals for a final state parton i which is usually defined by jet rates R n , where p and t are the transverse momentum and energy scale for the parton i. The jet rate R (i,out) n represents the probability that the parton i forms a n jets configuration by final state radiations [97][98][99][100][101]. We develop the jet rates into quark jet rates R (i,out) n,m , and redefine the functional as, where R (i,out) n,m represents the probability that the final state i forms a n jets configuration in which m quark jets are contained. i The number of jet n starts from 1 since the final i You may prefer the definition of functional such thatΦ(u g , u q ) = ng nq u ng g u nq qRn g ,nq , where n q and n g are the number of quark and gluon jets, andR ng,nq is the probability that an event has n q x 0 x~ ~~p zp p beam Figure 1: Schematic illustration of initial and final state radiations. The central blob shows a hard process, p is the transverse momentum for a final state, z is the energy fraction for the final state, and x and x are the momentum fractions for an initial state and its parent parton. state itself becomes a jet even if it doesn't emit any resolved emissions. We can recover the quark jet rates by differentiating the functional at u = 0 and v = 0 as, Similarly, we introduce a generating functional for a initial state parton i, where the jet rate R (i,in) n shows the probability that the initial parton i emits resolved emissions n times, and the quark jet rates R (i,in) n,m represents the probability that i emits n jets in which m quark jets are contained. The argument x is the energy fraction for i, therefore, the parton carry on the energy xp beam , where p beam is the beam energy. The number of jet n starts from 0 since the initial state doesn't generate any jet if it doesn't emit any resolved emissions.
A generating functional for a hard process is given by a product of functionals for initial and final states. For example, a generating functional which has initial states i 1 , i 2 and final states f 1 , f 2 is given as Φ = Ψ i 1 Ψ i 2 Φ f 1 Φ f 2 , and we can derive the quark jet rates for the hard process by the differentiate in Eq. (4).
For brevity, we omit the arguments u and v in the generating functionals below.

Evolution equations
We derive evolution equations for generating functionals of final and initial states. We first start with the final state. In the case that any resolved emission doesn't happen, namely for n = 1, jet rates for quarks (i = q) and gluons (i = g) are given as, where∆ i (p, t) is the Sudakov form factors which shows the probability that any resolved emission doesn't happen between the starting scale t and a minimum resolved scale. We quark jets and n g gluon jets. The two functionals are just related by Φ(u, v) =Φ(u, uv)| nq=n−ng, ng=m .
define the form factors in Sec. 2.1.2. A quark which doesn't emit any resolved emission forms one quark jet, so we need v in front of∆ q . In the case that the resolved emission happens at least one time, namely for n > 1, the jet rates are, whereΓ i (z, t) = α s (z, t)P i (z)/π, and α s (z, t) and P i (z) are the running strong coupling and the splitting functions. The ratio of Sudakov form factor shows the probability that the parton i doesn't emit any resolved emissions between the scale t and t . From these equations and the definition of generating functional, we get evolution equations for the generating functionals of final states [19,96] are given as, where Φ q and Φ g are the generating functionals for quarks and gluons in final states. For brevity we define the following logarithms: where t 0 is a given minimal scale. In terms of these variables the equations in (9) and (10) are given as, We next derive generating functionals for initial states. For an initial state i, from the DGLAP equation, the normalized change of a parton density for i, in other words, the probability that i emits an initial state radiation between t and t is, where x and x are the momentum fractions for i and its parent parton as shown in Fig. 1, f i is the parton distribution function (PDF) for i, and z = (x − x)/x . The probability that i doesn't emit any initial state radiations is, whereΠ i is the Sudakov form factor for initial states. In the case that a resolved emission happens at least one time, namely for n > 0, the jet rates are, From the above equations and the definitions of the generating functionals, the evolution equation for the functional of initial states is given as, The logarithms κ and κ are modified for initial states as, In terms of these variables the equations in (20) for quarks and gluons are, where we neglect the scale dependence on the ratio of PDF since the effect is negligible. ii The splitting kernels are summarised as follows: ii We fix the factorization scale for the PDF ratios to the hard scale, namely t = t, in numerical calculations below.
where we use the relative transverse momentum k t as the scale of the strong coupling, and expand the coupling at a minimal k t with the 1-loop beta function, where we use the emission angle of the initial and final state radiations as the scale, The variable R is the jet radius and p 0 is the minimum resolved transverse momentum which corresponds to the minimum p T cut for jets. The coefficients are a q,g = 2C F,A α s /π and a qq = n f T R α s /π for final states, and we remove the factor 2 in a g for initial states because the soft singularity for z → 1 in the gluon splitting function P g (z) is suppressed by the gluon PDF f g (x ). The symbol n f is the number of active flavors, where we set it to 5 in numerical calculations below. The splitting kernel Γ i (z) is given by removing the running effect of α s fromΓ i (z, t ).

Sudakov form factors
The Sudakov form factors for final states are defined as, Sudakov form factors which are evaluated by neglecting the running of α s and by using Γ i are given as, where We can see the structure of leading (or double) logarithms (LL) in ∆ q and ∆ g , and single logarithms in ∆ qq .
For initial states, the Sudakov factors are defined as,  Neglecting the running of α s and using Γ i , we get (1) We define a functionalizedκ with a function f as, where . For an identity function I, we can find a simple relation, κ (n) I =κ n . The two coefficients are given as, where q runs over all active quarks and anti-quarks. Fig. 2 shows x-dependence of the coefficients in Eqs. (43)- (45). The vertical axis shows xp beam which is the energy of an initial state. Valence quarks become dominant at large x, therefore, c (n) Q/g becomes bigger and c (n) g/u becomes smaller as xp beam increases. We adopt the CTEQ6L1 PDF [103] in the calculations with the help of a PDF parser package, ManeParse 2.0 [104].

Primary structure of functionals
Since the largest contribution to the t integration in Eqs. (9) and (10) comes from the region t ∼ t, we use approximations Φ q (λ ) ∼ Φ q (t 0 ) = uv and Φ g (λ ) ∼ Φ g (t 0 ) = u to see primary structures of the functionals [102]. iii In these approximations, subsequent emissions from a low-scale parton are prohibited. We correct the absence of the subsequent emissions effect in the next section. Neglecting the running of α s , namely using Γ Γ, we get The terms ∆ 1−u q,g (∝ n u n ·ᾱ n s L 2n /n!) which come from the integration of leading splitting kernels Γ q,g are involved in the increment of the number of gluon jets with the double logarithmic coefficientsᾱ n s L 2n , where L proportional to κ or λ. The term ∆ 1−uv 2 qq (∝ n u n v 2n ·ᾱ n s L n /n!) contains v, so it is involved in the increment of the number of quark jets with the single logarithmic coefficientsᾱ n s L n . Since the enhancement term of gluon jets has more logarithmic enhancement, the increment of gluon jets is larger than that of quark jets. For Φ (LL+qq) q , the functional doesn't contain the ∆ qq term, so the number of quark jets doesn't increase in this approximation.
Regarding the evolution equation for initial states in Eq. (22) and (23), we adopt the approximation Ψ i (λ ) ∼ Ψ i (t 0 ) = 1 as with the case of final states. The primary structures of the functionals is represented by, The structure of Sudakov form factors are, (1) Since the leading Sudakov Π 1−u i,1 doesn't contain v, it's not involved in the increment of the number of quark jets. On the other hands, the sub-leading Sudakov Π 1−u i,2 increase the number of quark jets since it contains v. In Fig. 2, we notice that the coefficient κ (1) f is basically larger than c (1) i/j since the former is given by the integral of the splitting kernels which have the soft-singularity terms. Therefore the increment of gluon jets iii In the approximations, the functional ratio in Eq. (14) takes a form Φ q (κ, λ )/Φ g (κ, λ ) ∼ uv/u, which causes unphysical terms, u n v m (n < m). We remove such terms by hand, which causes a unitarity violation, namely Φ| u=1,v=1 = 1. However the violation is very tiny, so we keep using the approximations. The unitarity violation rate is 1 − Φ| u=1,v=1 (0.1-0.4)% for numerical results in this paper.
is basically larger than that of quark jets as with the case of final state radiations. However, quark jets are often created from gluon initial state at high energy since c (1) Q/g gets bigger as the hard scale increases.

Corrections by subsequent emissions
We add a correction to the generating functionals evaluated in the last section. In the above approximation, emissions from soft gluons which are emitted from a parent parton are neglected. This means that a quark in final state doesn't make quark jets more than one due to the absence of g → qq. We modify the approximation applied to the soft gluon generating functional to Φ g (κ , λ ) Φ (LL+qq) g (κ , λ ). We also modify the approximation applied to the ratio of functionals to Φ q (κ, The ratio has argument κ, and it could be large for high energy partons. At that case, the Sudakov factors in the ratio get suppressed exponentially, and the accuracy of the approximation used in the last section become worse, so we should modify this too. Using the new approximations, we get The exponential factor exp(S i ) stems from the modification of the approximation for the soft-gluon generating functional. In other words, the term arises from activating subsequent emissions from soft-gluons. The term exp(S ) stems from the modification of the approximation for the functional ratio. Their full formulae are shown in Appendix A.1.
Leading terms for the exponents are as follows: In Eq. (58), the double logarithmic term −u ln ∆ i comes from the emission i → i + g.
The double and single logarithms in the square brackets comes from the subsequent emissions, and their fractional factors arise from the integrals of the ordering variables κ and/or λ . In Eq. (59), w/2 comes from the correction to the functional ratio.
For initial states, we adopt the same approximations, For an analytic function G = n c n κ n , we define a functionalized one, G[f ], as, We get G[I] = G for an identity function I. The full formula of S i is shown in Appendix A.1, and the leading terms are, (1) Regarding S q and S g , −uv ln Π i,2 comes from the sub-leading splitting kernels for g → qq and g → gq in Eqs (22) and (23). The following factors ±w/2 come from the corrections to the functional ratios. The sign of w/2 are opposite since the numerators and denominators for the functional ratios are opposite.

α s running correction
In this section, we consider the running effect of α s into the generating functional using the splitting kernelsΓ i . For final states, we get The full formulae for the exponential factors are shown in Appendix A.1 and their leading terms are, The two exponentsS i andS are the α s correction for S i and S , and e T i and e T are the corrections for ∆ i and ∆ qq .
The α s correction for the generating functionals of initial states are whereS i [f i/i ] and T i [f i/i ] are the functionalizedS i and T i . The full formulae for the exponentsS i and T i are shown in Appendix A.1, and their leading terms are as follows: S g ∼ uv ln Π g,2 · w · a c (1) 3 Numerical results

Number of quark jets
We evaluate the quark jet rate for a given Born configuration, i 1 i 2 → f 1 f 2 . A generating functional of the configuration is defined as, We assume that the two final states scatter in the central region, which tends to occur at high energy, and set asp T = p f 1 = p f 2 = x 1 p beam = x 2 p beam , where the proton beam energy is p beam = 7 TeV. The starting scale is set to the maximal one allowed kinematically, namely t max = √ 2.
We calculate the number of quark jets for events in which N jets jets are contained. The expected value for the number is given by, where the jet rates and quark jet rates for i 1 i 2 → f 1 f 2 are given as, In Fig. 3, the results for gg → gg (left), gu → gu (center) and uu → uu (right) are shown. In the calculation, p 0 = 50 GeV and R = 0.4 are used. We set the parton transverse momentum or energy asp T = 1 TeV. The blue, green and red curves are analytical results using the functionals labeled by (LL + qq), (LL + qq + sub) and (LL + qq + sub + δα s ). The black curves show Monte-Carlo predictions given by Herwig++ [105]. Hadronization is turned off and the generated partons are clustered by anti-k T The results for gg → gg (left), gu → gu (center) and uu → uu (right) are shown. In the calculation, p 0 = 50 GeV and R = 0.4 are used. (1) Q/g a q 2c qq a qq gu → gu 1 a g (κ +κ (1) algorithm [106]. For an arbitrary jet, if the sum of electric charge of partons contained in a jet is zero, we assign the jet to gluon jet, otherwise to quark jet.
The primary structure of functionals has the form, then the expected value for the functional is given by, where m 0 is the number of quarks in final states. As increasing the coefficients B and C related to v, the number of quark jets increases. The three coefficients and the initial number of quark for three configurations gg → gg, gu → gu, and uu → uu are shown in Table 1. When we neglect subsequent emissions, the increase of quark jets for uu → uu is tiny because it is caused by only B and the coefficient is much smaller than other coefficients as shown in Fig. 2. The main cause of the increase of quark jets for uu → uu stems from exp(S q ) which is related to subsequent emissions, and the lowest order at which v appears is O(u 4 v 4 ), therefore, the number of quark jets begins to increase clearly from N jets = 4. You can also see auxiliary plots in Appendix A.3 where only initial or final state radiations are taken into account.

Expected improvement by the quark/gluon discrimination
The coefficient A related to the increment of gluon jets is basically bigger than B and C at high energy since it has the logarithmic enhancement κ, and color factors in A are bigger than those in B and C for each matrix element configurations. This means that many of QCD multi-jet background is composed of gluon jets and few quark jets which  stem from the valence quarks. So, we can distinguish between QCD backgrounds and a signal which predicts many quark jets using the quark/gluon discrimination.
We estimate how much the signal-to-background ratio (S/B) improves for such signals using the quark/gluon tagging. In many studies, plots on quark jet efficiencies ( q ) versus gluon jet efficiencies ( g ) using jet substructure variables and Monte-Carlo generators are shown. Assuming that a signal and background have N S q and N B q quark jets, we expect that S/B gains ( q / g ) N S q −N B q times for each signal efficiencies N S q q , where the efficiency ratio is greater than 1 in the assumption that the quark jet is signal. We consider a toy signal in which jets are all quark jets, namely N S q = N jets . Such signal will be discussed in the next section in detail. We can calculate the number of quark jets in the background N B q using Eq. (80), and can get the efficiencies from q versus g curves.
In Fig. 4, the left figure shows the jet-p T dependence on the gluon efficiency for several quark efficiencies. We use Vincia [107][108][109][110][111][112] in the calculation. iv We use an output evaluated by the BDT algorithm as a discrimination variable. The output is trained using four variables, namely the number of charged tracks, energy correlation functions [16] with β = 0.2 and 1.0, and p T -normalized jet mass (m jet /p T ). Only charged track informations are used for the calculation of the BDT inputs.
The right plot in Fig. 4 shows the dependence of a new physics scale Λ new on the improvement factor S / B = ( q / g ) N S q −N B q for each N jets categories. We assume that new physics arises at the invariant mass of initial partons √ŝ = Λ new . The jet p T used in the estimation of the efficiencies is set to Λ new /N jets . We define the generating functional at a scale Λ new for proton collisions as, where we set the transverse momentum of the functional top T = Λ new /2. The function f i is the proton PDF for a parton i, and the factorization scale is set to µ F =p T . The iv The one problem on the topic of quark/gluon discrimination is Monte-Carlo uncertainties of the predictions, and known that experimental data on certain observables related to quark/gluon tagging lie in between the predictions of the two MC generators Pythia and Herwig [54,56,57]. Although Vincia's results are close to Pythia's one, those lie in the the predictions of the two MC generators, and the uncertainties are focused in Refs. [5,113,114]. signal efficiency is fixed at 0.4. We can see the efficiency ratio which is proportional to S/B increases as the number of jets increases since the difference of the number of quark jets between the signal and background, namely N S q − N B q , gets larger. Also, the ratio improves as the new physics scale gets larger because discrimination power for the quark/gluon separation increases as the jet p T increases. The effect is clear in large N jets categories. The probability that valence quarks are in final states becomes larger as Λ new increases. This makes the difference of the number of quark jets between the signal and background small, and makes the ratio decrease. This effect looks important in small N jets categories.

BSM searches in milti-jet final states
The data-driven method is often used for the analysis of multi-jet final states. A typical analysis is the micro black hole search [73][74][75][76][77][78][79]. In the analysis, phase space is devided by a variable related to the hard scale, e.g., the scalar sum of jet transverse momenta H T , the scalar sum of the masses of large-R jets, M J [115][116][117] etc. We fit the distribution of the variable on phase space at low-energy scale referred to as control region (CR), and estimate the number of background on phase space at high-energy scale referred to as signal region (SR) using the fit function. If there is excess from the estimated background, we think it as a sign of new physics.
One of the problems for such analyses is that we simplify the high jet multiplicity events too much. In the analysis explained above, only one inclusive variable is used. v We introduce a variable to incorporate the flavor information of multi-jet final states in the analysis using the following variable, vi where Q i (> 0) shows a kind of quark-jet-ness for i-th jet. If jet substructure for the i-th jet looks like quark jet rather than gluon jet, Q i takes larger value. In this paper, we use BDT outputs used in Section 3.2 as Q i , which is trained like that quark and gluon jet are assigned to 1 and 0. The variable d takes a large value for events which contain many quark jets.
We consider the following toy-signal topologies: The pare production of a hypothetical heavy resonance X has initial states gg or uū in proton collisions and X decays into n X quarks. For example, the pair production of gluinos and squarks in SUSY with R-Parity Violation has the same decay topology. We generate hard processes using Madgraph5 [80] with CTEQ6L PDF, and have X decay in v The sum of fat jet masses M J also contain some information of exclusive variables like jet p T and the distance between sub-jets. vi We found that the performance of the discrimination between the signal and background discussed below can increase slightly with a more complicated definition of d. The optimization of variable d is beyond the scope of this paper. phase space flatly. When n X is odd or even, X is assigned to the color-octet or -triplet, and color indices of X are connected to those of quarks in the large-N c limit. We use Vincia for the parton showering and the hadronization.
For the simulation of QCD multi-jet background, we use Vincia with the default setting.
The generated signal and background are clustered with the anti-k t algorithm and the jet radius is set to R = 0.4. As selection cuts, minimum transverse momentum (p T > 50GeV) and rapidity cut (|η| < 2.8) are imposed to all jets. The invariant mass of the collision system is set to √ s = 14 TeV.
In Fig. 5, the black and red curves show the distribution of Q i for the background and signal, where H T > 2 TeV, N jets ≥ 8 are imposed. For background, Q i 's tend to be distributed in the region close to 0 since the gluon jets are dominant in QCD multi-jet final states, however, Q 1 has a clear peak on the side of 1 due to the effect from valence quark jets. For the signal, we set the mass of X (M X ) to 2 TeV, and n X = 3. The signal has six quarks in the hard process, so Q i 's are distributed in the region close to 1 up to 6-th jet. The 7-th and 8-th jets tend to stem from QCD radiations, so the differences between the signal and background become small.
In Fig. 6, the distribution of d for the QCD multi-jets (left) and the signal (right) for each N jets categories are shown. The signal is set to M X = 2 TeV and n X = 5. We can remove the background by imposing cut d > d cut , since the signal is distributed in large-d region.
In Fig. 7, the remaining rates of the number of events after imposing d-cut for each H T bins are shown. The left and right figures are the results for the QCD multi-jets and the signal at N jets ≥ 6. The signal parameters are M X = 2 TeV and n X = 5. The background gets decreasing rapidly after imposing larger d cut . We want to know the number of backgrounds after imposing d-cut at high energy region. Due to the complexity of large jet multiplicity events, such number should be estimated by the data-driven method. The dotted curves in Fig. 7 show an example of interpolation curves which are fitted by using data in a control region, e.g., H T < 4 TeV in the figure.
In the practical analysis, we can know the ratio of background in signal regions using such interpolation curves, and can give the upper bound on the cross-section imposed d-cut in QCD multi-jet final states.
We estimate how much S/B improves by applying the quark/gluon discrimination in multi-jet final states. The ratio is given by, where σ X is the cross-section for the signal (X = S) and the background (X = B) after imposing the condition in the brackets. The impact coming from quark/gluon discrimination is included in the second efficiency ratio. We employ the cut H T > 1.8M X , which almost makes the significance of signal maximum in the case that systematic uncertainties and d-cut are neglected.
In Fig. 8, M X -dependence on the efficiency ratio is shown, where the dependence  Figure 8: M X -dependence on the efficiency ratio. We can see how the ratio changes with increasing the lower bound of N jets from 3 to 10, and n X from 2 (left-most) to 5 (right-most). on S / √ B is also shown on the right axis. We can see how the ratio changes with increasing the lower bound of N jets from 3 to 10, and n X from 2 (left-most) to 5 (rightmost). We choose d cut which gives the signal efficiency S = 0.4. These are the results in the case that the initial state is gg. The ratio clearly increases as the lower bound of N jets gains up to 2n X since 2n X quarks are contained in the hard processes of signal. A quark emitted from X could be softer than partons arising from initial and/or final state radiations. In that case, the quark from X could be (2n X + 1)-th jet, so we can see some improvements on S/B even if the lower bound of N jets is greater than 2n X . We can understand the M X dependence on the improvement factor from Section 3.2. The ratio improves as the masses get larger because the discrimination power of the quark/gluon separation increases as the jet p T increases. The effect is clear in the large N jets categories. The probability that valence quarks are in final states becomes larger as the masses increase. This makes the difference of the number of quark jets between the signal and background small, and makes the ratio decrease. The effect looks important in the small N jets categories. We also see a good agreement between the right-most figure and the semi-analytic result in Fig. 4. In both cases, signals are quite quark jet dominant. Fig. 9 shows the same as Fig. 8, with the initial states uū. The Born configuration of gg initial state tends to emit valence quark jets at hard energy, but also emit gluon jets more compared to the case of uū initial state since the color factor for g → gg is larger than q → qg 9/4 times. Consequently, initial state radiations from gg reduce the quark jet fraction, then the efficiency ratios for uū is slightly better than those for gg.

Conclusions
The quark/gluon discrimination is maximally utilized for searches of new physics that predicts quark and gluon jet fractions which is different from what the QCD background does. To know the jet flavor structure in QCD multi-jet final states at hadron colliders, we have introduced quark jet rates R n,m which is the probability that a parton or a matrix element produce n jets in which m quark jets are contained. We have calculated generating functionals, which contain the quark jet rates as coefficients, for initial and final state by using the QCD resummation technique.
Exponential structures of the functionals are evaluated and we can get the quark jet rates R n,m from the expansion coefficients. The increment of gluon jets mainly arises from leading logarithmic terms in the coefficients, and that of quark jets comes from next-to-leading logarithmic terms. More details of the logarithmic structure are also shown. In order to know a rate of the increment of quark jets, we have shown the expected value of the number of quark jets in N jets categories for matrix element configurations gg → gg, gu → gu and uu → uu. For example, when we set the jet radius, jet p T cut, and the parton p T cut to R = 0.4, p 0 = 50 TeV andp T = 1 TeV, the number of quark jets increases by about 0.25, 0.18 and 0.12 for the three configurations while the number of jet increases by 1. We have also checked the consistency between the analytical results and Monte-Carlo predictions.
Since the QCD multi-jets are basically composed of few valence quark jets and many gluon jets, we expect a big improvement on S/B for a signal by using the quark/gluon discrimination, where the signal predicts many quark jets. We have estimated the improvement semi-analytically using the above results and have shown that the improvement gets larger as the number of quark jets in signals increases. For example, S/B increases by about 20 times in the case that a new physics scale is Λ new = 4 TeV and the number of quark jets is 10.
We have introduced a variable d that takes a large value for events in which many quark jets contain, and have suggested a data-driven analysis using the variable. Assuming a pare production of a hypothetical heavy resonance X which decays into n X quarks as a signal, we have evaluated the large improvement on S/B for each masses of X, n X and initial states in Monte-Carlo analysis, and have shown the usability of quark/gluon discrimination in multi-jet final states.
A Details on qurak jet rates A.1 Formulae In Sec. 2.2, generating functionals which contain effects of emissions coming from only progenitor partons are evaluated, and those for final states are given as, For partons in initial states of a hard process, we get In Sec. 2.3, we have also considered effects of subsequent emissions. The functionals are factorized into the primary terms and exponential terms related to the effect as, where and For q in Eq. (99), we used an approximation c qq 2/3. For initial states, the generating functionals containing effects of subsequent emissions are where S q = −uv ln Π q,2 I 2 (−w), S g = −uv ln Π g,2 I 2 (+w).
In Sec. 2.4, the running effects of α s are also considered. In the case, the functionals for final states are written as, wherẽ S −uv 2 a qq a 1 + aκ + aλ κ − 13 12 and In the calculations ofS and T in Eqs. (108) and (109), we use the following approximation, This approximation has good accuracy because the integrants forS and T are localized around κ = κ and λ = λ. For initial states, we get whereS q uv ln Π q,2 aI 5 (c (1) g/q , c S g uv ln Π g,2 aI 5 (c and In the calculations ofS q,g in Eqs. (114) and (115), we expanded D by the logarithms and took into account only the leading term, namely D a(κ + λ ).

A.2 Matrix element corrections
In Sec. 3, the number of quark jets for each N jets categories are evaluated by applying parton showers to Born configurations. In the calculation, matrix element corrections are absence for more than 2 jets. In Fig. 10, we show the matrix element correction to the number of quark jets with CKKW matching using Sherpa [118,119]. The black curve is a result for pp → jj + parton showers. One and two partons are matched into the Born configuration and the results are shown in the red and blue curves. We impose H T > 2 TeV and set to √ s = 14 TeV.
In the black curve, N quark-jets is about 1.5 at N jets = 2. This means that final states tend to become two valence quarks. In this case, N quark-jets curve has an artificial kink at N jets = 3 as discussed in Sec. 3. We can see that the kink disappears with matching. In the red and blue curves, the matrix element corrections are contained up to N jets = 3 and 4, and we find the configuration containing gluons in final states increases. As we see in Sec. 3, the increase rate of quark jets for the gluon final state is larger than that for the quark final state. Therefore, the rate slightly increases after the matching.

A.3 Initial and final state radiation
Generating funtionals for initial state radiations (ISR) and final state radiations (FSR) are calculated in Sec. 2. The number of quark jets are evaluated using the funtionals for three Born configulations in Sec. 3. In the calculation, the contribution from ISR and FSR to the number are averaged. In Fig. 11 and Fig. 12, we show results in which only ISR and FSR are taken into account. The results for gg → gg (left), gu → gu (center) and uu → uu (right) are shown. In the calculation, p 0 = 50 GeV, and R = 0.4 are used. We setp T to 1 TeV. The blue, green and red curves are the analytical calculations using the functionals labeled by (LL + qq), (LL + qq + sub) and (LL + qq + sub + δα s ). The black curves show the Monte-Carlo prediction given by Herwig++.  Figure 11: Same as Fig. 3, considering only the initial state radiation.  When we neglect subsequent emissions (LL+qq), the increase of quark jets for uu → uu in the case of ISR-only is tiny for the same reason as discussed in Fig. 3. In the case of FSR-only, the generating functional doesn't contain v, so the number of quark jets doesn't increase at all. The main cause of the increase of quark jets for uu → uu stems from exp(S q ) which is related to subsequent emissions, and the lowest order at which v appears is O(u 4 v 4 ), therefore, the number of quark jets begins to increase clearly from N jets = 4 as discussed in Sec. 3. For the case of ISR-only and gg → gg, the number of quark jets decreases a lot when we take into account the subsequent emissions. This is mainly because the coefficients for uv in Eq. (65) takes a large negative number, which stems from the improvement of approximation to the generating functional ratio in Eq. (23).