Future lepton collider prospects for a ubiquitous composite pseudo-scalar

Composite Higgs models feature new strong dynamics leading to the description of the Higgs boson as a bound state arising from the breaking of a global (flavour) symmetry. These models generally include light states generated by the same dynamics, the detection of which may present the first observable signs of compositeness. One such state is a pseudo-scalar boson resulting from the breaking of a $U(1)$ symmetry common to most composite setups, and whose hints are expected to be visible through low-mass resonance searches at present and future hadron and lepton colliders. In this work we study the phenomenology of this pseudo-scalar field. We show that, for a light state, bottom quark loop effects dominantly impact the production cross section and considerably modify the decay pattern. Moreover, we make a case for targeted low-mass analyses at future lepton colliders, with an emphasis on high-luminosity machines aiming to operate at low centre-of-mass energies. We present a simplified outline of a search for this light pseudo-scalar at one such machine, considering electron-positron collisions at the $Z$-pole. We focus on a signature arising from the pseudo-scalar decay into a pair of hadronic taus and a production mode association with a pair of leptons of opposite electric charges, and compare cut and count methods with machine learning methods.


INTRODUCTION
One of the foremost goals of the current experimental high energy physics programme is the search for new resonances. The LHC is in a long shut-down following its 13 TeV run, preparing to operate at its 14 TeV design energy as well as for a high luminosity (HL-LHC) run. At these higher energies and luminosities, efforts will be focused predominantly on the search for resonances typically heavier than the Higgs boson. However, new physics may still be concealed at lower energies. Proposals for electron-positron colliders designed to be complementary to the LHC have been put forward, including the International Linear Collider (ILC) [1], with initial energies in the range of 250-500 GeV and ranging up to 1 TeV, the Compact Linear Collider (CLIC) [2], which could reach up to 3 TeV, as well as the Future Circular Collider (FCC-ee) [3] and Circular Electron Positron Collider (CEPC) [4], which will operate around the Z pole, the W W and tt thresholds, and in a Higgs factory mode. These machines, with lower centre-of-mass (c.m.) energies than the LHC, are designed to be 'factories' for resonances such as the Z, W , and Higgs bosons and the top quark, offering the possibility of precision measurements of their couplings and related Lagrangian parameters. They can also be used for targeted low-mass resonance searches, providing a window to possible Beyond the Standard Model (BSM) physics at lower energy scales through large integrated luminosities. In this article we investigate the potential of finding hints for a new light pseudo-scalar a emerging from a composite Higgs model, which may be accessed at both hadron and lepton colliders.
Composite Higgs models [5][6][7] describe the Standard Model (SM) Higgs sector in terms of high-scale fundamental gauge dynamics by postulating the existence of a new strong sector. These models implement gauge and fermionic degrees of freedom, confining at low energies [8]. In the following we describe a class of composite Higgs models featuring fermionic matter [9,10], charged under a global symmetry G and governed by a hyper-colour gauge group G HC . The breaking of G then leads to the appearance of resonances that are bound states of the underlying fermions [11]. Composite Higgs models [12][13][14][15] are an attractive class of BSM theories as they provide a solution to the hierarchy problem inherent to the SM whose Higgs sector is unstable with respect to quantum corrections. If the Higgs boson is not an elementary scalar but rather a bound state of strong dynamics, these quantum corrections may only contribute up to a finite scale, hence stabilising the Higgs field dynamics [13]. Introducing this scale of compositeness is one of a few options for a natural generation of the Higgs boson mass, and offers an explanation for the scale of electroweak symmetry breaking [15].
In addition to the breaking of the electroweak symmetry [16], occurring at a scale v ∼ 246 GeV, the global symmetries of the fundamental fermion sector are broken at some condensation scale on the order of 1 TeV [17]. The possibilities for global and gauge symmetries within a composite Higgs model, though subject to some constraints, are fairly broad. As such, we allow in our analysis the group structures and symmetries within the theory to vary, considering a spectrum of twelve possible models that have recently been proposed as the most minimal options for a composite high-scale dynamics featuring solely fundamental fermions [9]. These models, which are strongly coupled in the IR, employ a minimal set of fields and depend on fully computable parameters. Fermion mass generation is achieved via partial compositeness [18]; however, given constraints on asymptotic freedom, this mechanism is limited to the generation of the mass of the top quark [19]. While the global symmetries of the fermions and the hypercolour groups vary across the models, similarities across composite Higgs models of this nature remain, particularly at the effective low energy scale. In particular, in addition to the existence of a QCD symmetry, there always exists a non-anomalous U (1) symmetry, acting on all species of underlying fermions in the theory [17,20,21]. The first evidence of new physics may therefore arise from direct searches for the additional light state a produced in association with the Higgs boson and associated with the breaking of this extra U (1). Such a pseudo-scalar state moreover features interactions with the SM gauge bosons via the Wess-Zumino-Witten anomaly [22,23], which is of particular phenomenological interest concerning its production at colliders.
In order to have evaded detection up to now, a light pseudo-scalar would need to be weakly coupled to the SM particles, carrying no colour or electric charge. Our candidate, the ubiquitous a, is considered here to have a mass between 10 and 100 GeV. This range warrants a particular investigation given the deficiency of LHC searches in the lower end of this mass range thus far [17]. In the considered set of models, the parameters of the theory and couplings to other states are fully specified and calculable, allowing for the construction of a general analysis targeting a new light scalar. To this aim we have formulated a new unique implementation of the pseudo-scalar a within state-of-theart modeling tools, in order to describe simultaneously a range of composite Higgs models with a variety of group structures. We have generalised previous implementations of this class of theories [17,20], where only loops of SM top quarks were contributing to the interactions of the pseudo-scalar with the SM gauge bosons, and the impact of all lighter quarks was taken to be negligible. On the contrary, our work includes both top and bottom quark contributions, which are shown to provide non-negligible effects, particularly for the phenomenology of very light a bosons. This is significant given the interest in low-mass resonance searches at the LHC, including di-jet [24,25], di-muon [26,27], di-photon [28,29] and di-tau [30] searches in recent years. These searches yield poor constraints in the low pseudo-scalar mass regions, but the associated results rely on older cross section predictions, ignoring bottom-quark loop effects. The regions of interest are therefore better covered than initially thought. In particular, di-tau searches consist of one of the golden channels for the considered pseudo-scalar, as the corresponding branching ratio is usually quite large [20]. The latter are generally strictly restricted to the high-mass regime, due to the presence of the large QCD background that proves to be an obstacle to low mass searches. For this reason we have instead investigated the potential of future lepton colliders where, following an overview of possibilities across a range of proposed colliders, an analysis at a low c.m. energy of the signal induced by the production of the pseudo-scalar is presented.
In this manuscript we begin, in Sec. 2, with a theoretical motivation, describing composite Higgs models which are built on a theory of fundamental fermions and outlining the specifics of the models to be studied. In Sec. 2.1 we describe the dynamics of the boson a, which is the subject of this work, and in Sec. 2.2, its anomalous couplings to the SM gauge bosons. Sec. 3 outlines a possible low mass search strategy at lepton colliders, investigating the production of a at a variety of future experiments, before constructing an analysis for a future electron-positron collider aiming at operating at the Z-pole. We in particular outline and compare the expectation of a cut and count analysis, described in Sec. 3.3, and a machine learning approach based on gradient boosted trees, described in Sec. 3.4. We summarise our work and conclude in Sec. 4.

THEORETICAL MOTIVATION
This work considers a class of models [9] with a variety of hyper-colour groups and several of the most minimal cosets G/H characterising the dynamics below the confinement scale. The symmetry breaking patterns in each model are determined by the properties of the underlying fermions. For a given model with N f Dirac fermions of the same species, we may only have one of two possible global flavour symmetries G, namely SU (2N f ) for a (pseudo-)real fermion representation, or SU (N f )×SU (N f ) for a complex fermion representation [17]. The chiral symmetry breaking may then follow one of three patterns; SU (2N f ) → SO(2N f ) for a real representation, SU (2N f ) → Sp(2N f ) for a pseudo-real one, or SU (N f ) × SU (N f ) → SU (N f ) in the case of a complex representation [31].
In a general composite Higgs model, the mass of the SM fermions is generated through either four-fermion interactions [32] or partial compositeness [18]. The latter presents a need for fermions in (at least) two different irreducible representations of the unbroken hyper-colour group G HC , leading to a rich spectrum within the theory [33]. All models considered in this work thus contain two species of underlying fermions which we denote ψ and χ following the notation of Ref. [9], and which belong to different irreducible representations of G HC and G. The first fermion ψ gives rise to the Higgs boson through the breaking of the associated global symmetry G into the electroweak (EW) coset H, and carries electroweak charges. The misalignment of the Higgs field then drives the usual electroweak symmetry breaking process, the mass of the Higgs boson being generated through some explicit breaking in the global sector [34]. The second species of fermion χ is responsible for partial compositeness, that proceeds through a mixing of the top quark with a composite operator of the same quantum numbers [13]. The fermion χ hence carries QCD colour and hypercharge quantum numbers, and the breaking of the global symmetry then generates the QCD coset. The traditionally searched-for spin− 1 2 vector-like top-partners are therefore composed of fermions in two representations of the hyper-colour group, of the form ψχχ or ψψχ, and are labelled chimera baryons.
The presence of fermions in two irreducible representations in the theory means that there will always exist two U (1) axial symmetries resulting from the full symmetry breaking pattern, one combination of these being non-anomalous with respect to the confining hyper-colour group [35]. As a result, one of the numerous pseudo-Nambu-Goldstone bosons always turns out to be light, i.e. lighter than the confinement scale. This contrasts with the anomalous axial current in QCD. In a composite Higgs model we thus expect a low energy spectrum in which the Higgs boson is accompanied by exotic composite scalars and fermions, some of which are ubiquitous to all composite Higgs models and are of the ψψ , χχ , ψχχ or ψψχ forms. Notably, the condensation of the underlying fermions also breaks the pervading non-anomalous U (1) symmetry, leading to two massive singlet physical eigenstates that we denote a and η , where a is the lightest state. There exists some non-trivial mixing between the two corresponding gauge eigenstates, which depends on the characteristics of the underlying fermionic sector. The mixing angle α dec reads, in the decoupling limit [17], where q ψ (q χ ) is the charge of the fermion ψ (χ) under the non-anomalous U (1) symmetry, N ψ (N χ ) its multiplicity, and f ψ (f χ ) its decay constant.
In the decoupling limit that we consider in this work, all other states decouple so that one solely focuses on the phenomenology of the light pseudo-scalar a. The range of models according to the most minimal choices for the gauge structure, numbered M1-M12 [9], are presented in Tab. I in which we summarise their properties. Each model is there defined by a hyper-colour symmetry group G HC , given together with the corresponding irreducible representations of the two species of fermions. The table also specifies the EW and QCD cosets for each model. A great advantage to these models is the computability of all low-energy parameters, which are completely determined by the underlying fermion construction. The models are further described in detail in Refs. [17,20,35].

A light U (1) pseudo-scalar
In order to study the phenomenology of the light pseudo-scalar a at colliders, we have constructed a new Feyn-Rules [36] implementation of a simplified model describing the dynamics of the a state in a general way. This allows for the generation of UFO model files [37] that can be used further within the MadGraph5 aMC@NLO (MG5 aMC) framework [38] for the calculation of predictions at colliders. We extend the SM by a composite pseudo-scalar a that exhibits small couplings to the SM fermions, gauge bosons, and the Higgs boson, and is a singlet under the SM symmetries. The pseudo-scalar is modelled as having a mass of less than 100 GeV, and we consider a parametrisation in which the couplings and mass are independent. In practice, we augment the SM Lagrangian with the effective Lagrangian L a [20], where M a is the mass of the pseudo-scalar and the sum indicates a sum over all SM fermion fields Ψ f with masses m f . The C f and κ V (with V = g, W, B) parameters are model-specific and control the couplings of a to fermions and GHC EW and QCD coset ψ χ qχ/q ψ M1 SO(7)  [17,20,35]. The first column contains the model naming convention, and the second indicates the confining hyper-colour gauge group, followed by the EW and QCD cosets (third column). We then provide the irreducible representations of the fermions ψ (fourth column) and χ (fifth column) under the coset choice. The final column includes the ratio of the fermion charges under the non-anomalous U (1) symmetry. gauge bosons respectively. In our notation, g s , g and g refer to the strong, weak and hypercharge coupling constants, and G µν , W µν and B µν (G µν ,W µν andB µν ) stand for the associated (dual) field strength tensors. The decay constant f a of the pseudo-scalar a, that drives the strength of the pseudo-scalar couplings to the SM particles, is defined as which we set to 1 TeV in this analysis, as previous studies [20] show that the lower bound on f a is always below 1 TeV for the models under consideration. This description being effective, we recall that we can only rely upon it for energies or momenta below a cut-off scale Λ ∼ 4πf a .

Pseudo-scalar couplings to gauge bosons
Leading-order couplings of the form aV V , where V, V stand for gauge bosons which may or may not be different, proceed via the Wess-Zumino-Witten anomaly and are depicted through an effective vertex in the Lagrangian of Eq. (2.2). However, an SM component where the vertex is constructed from loops of SM fermions, an example of which is shown in Fig. 1 (left), is generally significant and should be included. In order to access each gauge-pseudoscalar vertex, we rewrite the gauge-boson interaction part L V of the Lagrangian of Eq. (2.2) in terms of the physical gauge bosons, where c W and s W denote the cosine and sine of the EW mixing angle, and e is the electromagnetic coupling constant. While the anomalous couplings read where contributions originating from the SM fermion loops should additionally be included for all existing interactions (gga, γγa, ZZa, W + W − a and Zγa). It is, however, expected that the role of the leptons and of the five light flavours of quarks is negligible, their couplings to the pseudo-scalar being suppressed by the smallness of their masses, as are the contributions from the electroweak bosons that are suppressed by the heavy propagators running into the loops. In the following, we however stress the importance of the bottom quark, whose contributions are in fact not so negligible.
As an example, we focus on the gga vertex and calculate the partonic gluon-fusion production cross section of a pseudo-scalar, Such an expression includes the anomaly contribution, as well as the sum over the contributions from each fermion species. The function A(τ ) is defined, for a given fermion, by which results from the three-point scalar function of the quark loop propagator. In the case of top quarks, τ t ≥ 1 and A(τ t ) is approximately constant (≈ 1) throughout the pseudo-scalar mass range. This thus leads to an approximately constant increase in the gluon fusion production cross section relative to the pure anomalous component, as illustrated in Fig. 2. In the latter, we convolute the expression of Eq. (2.6) with the leading order set of NNPDF 2.3 parton densities NNPDF23 lo as 0130 qed [39], for κ g = 1 and the model M1 with M a ∈ [10, 100] GeV. In this analysis we ignore lower masses, where a light pseudo-scalar is subject to strong experimental bounds [20]. The behaviour is quite different in the case of b quarks, where logarithmic effects produce an oscillation in the contribution to the cross section (see Fig. 2). At low masses of a, the bottom quark contribution substantially increases the cross section. For higher masses, there is a small destructive interference between the top and bottom contributions, leading to a slight decrease in cross section. This shape of the bottom quark contribution arises due to the form of the three point scalar function A(τ ) for small τ , where the interplay between the real and imaginary part of the loop-integral leads to the observed undulation in the cross section behaviour. We do not include the contributions of any quarks lighter than the b to run in the fermion loop, as the considered mass range of the pseudo-scalar leads to lighter quarks having a negligible impact on the cross section. This can be supported by the observation that the c quark, the next heaviest quark with M c = 1.275 GeV, would impact the cross section at most by the amount that the b quark contributes at M a ≈ 33 GeV, due to the form of A(τ c ) for the small τ c value. Given that the bottom quark contribution to the cross section is already negligible at this point, all other quark contributions can therefore be ignored.

A LOW MASS PSEUDO-SCALAR AT LEPTON COLLIDERS
In this section we estimate the prospects of future e + e − collider searches for this additional light scalar as an alternative to searches at hadronic colliders. We are interested, in particular, in designing an analysis that addresses the parameter space region in which M a ∈ [10, 60] GeV, where constraints on possible light scalars are particularly weak [20]. This mass window has indeed been left relatively open due to few direct searches performed thus far, the dominant constraint originating from searches for novel SM Higgs decay modes (h → aa).
We begin this section with a study of pseudo-scalar production at lepton colliders, which differs from that at hadron colliders as the pseudo-scalar is produced predominantly in association with other states. We consider a variety of future e + e − colliders operating at different c.m. energies and, in the case of linear colliders, different polarisation options. We then focus on a circular electron-positron collider aiming to operate at the Z-pole and investigate a possible search channel targeting pseudo-scalar decays into a pair of tau leptons.

Pseudo-scalar production at lepton colliders
We consider the production of the pseudo-scalar in association with a virtual photon or with a (virtual or real) Z-boson, that we generically denote as V , and that 'decays' into any pair of fermions. This mainly proceeds via the Feynman diagrams shown in the first row of Fig. 3, where one distinguishes a tree-level contribution in which an e + e − pair annihilates into a V a system through the Wess-Zumino-Witten anomalous κ γZ coupling (central diagram), a loop-induced contribution (left diagram) where SM top and bottom quarks are running in the loop (see Sec. 2.2), and a t-channel contribution (right diagram). In the case where the V -boson leads to an e + e − final state, extra nonresonant diagrams additionally contribute (second row of Fig. 3). We identify two potentially appealing signatures that differ by the considered final state: the production of the pseudo-scalar in association either with a pair of opposite-sign first or second generation leptons = e, µ, or with a pair of light jets j (i.e. not originating from the fragmentation of a b-quark), e + e − → + − a , e + e − → jja . Across all models, the branching ratios of the pseudo-scalar a into τ + τ − and bb pairs are the highest, as the coupling of the pseudo-scalar to fermions is proportional to the corresponding fermion mass. We therefore choose to design an  analysis dedicated to probing those decay modes, as they are not only the most abundant, but also feature final-state objects not too difficult to reconstruct, and offer several handles to extract a signal from the background, as will be shown below.
The future colliders currently under study include both linear (ILC [1] and CLIC [2]) and circular (FCC-ee [3] and CEPC [4]) possibilities. Electron-positron colliders may provide a promising avenue through which to search for a light pseudo-scalar. Many future lepton colliders indeed offer low c.m. energies associated with high luminosities, which may allow for the detection of weakly interacting light particles. While linear colliders offer the benefit of beam polarisation, where some processes such as W -boson fusion Higgs production (which depend on the chirality of the colliding leptons) may be amplified, circular colliders allow for the accumulation of higher luminosities, useful when searching for particles that couple timidly or are not copiously produced. They moreover offer the possibility of hadron collider upgrades.
We first proceed with the calculation of the production cross section associated with all processes of interest, for each of the aforementioned future lepton collider options and the varied c.m. energy choices that have been proposed in the literature. Using MG5 aMC, we hence present in Fig. 4 total cross sections for the two processes of Eq. (3.1) for the ILC and CLIC linear colliders. We rely, as a benchmark, on the model M1 and show leading-order (LO) predictions for the production of the pseudo-scalar a in association with leptons (solid lines) and jets (dashed lines). We also considered a variety of beam polarisations relevant for the ILC collider, however, there was no appreciable change in behaviour for different polarisations. In Fig. 5 we focus instead on the CEPC and FCC-ee circular colliders, our predictions for pseudo-scalar production in association with leptons being multiplied by a factor of 15 for legibility. Our results include basic selections on the final-state leptons and jets, their transverse momentum p T and pseudo-rapidity η being enforced to satisfy Moreover, jets and leptons are required to be well separated in the transverse plane, by an angular distance ∆R of at least 0.4, From these figures we are able to gain an understanding of the potential pseudo-scalar abundance at future lepton colliders.
Though linear colliders enjoy a relatively constant production cross section of about 0.01-0.1 fb throughout the entire probed pseudo-scalar mass range (M a < 100 GeV), circular colliders, operating at lower c.m. energies, are subject to a fall-off of production cross section at higher masses as the available phase space decreases. However, for the parameter space region in which we are interested (featuring M a < 60 GeV), the production rates lie in the same ballpark regardless of the collider under consideration. As linear colliders are subject to much lower expected integrated luminosities, we focus, in the following analysis outline, on a circular collider. Their larger expected luminosities may prove crucial in a search for weakly coupled particles, while operating at relatively low c.m. energies is useful in reducing otherwise significant backgrounds, such as originating from tt and di-boson processes.

A case study at a circular collider operating at the Z-pole
We propose an analysis at a future high luminosity lepton collider aiming at operating at the Z-pole and an integrated luminosity of 150 ab −1 , which corresponds to the FCC-ee expectation. While the signal production cross section is subject to a steep decline for higher masses of the pseudo-scalar a, the mass range of interest (M a ∈ [10, 60] GeV) is still relatively well covered, as shown in the previous subsection (see Fig. 5).
As a choice for the detector parametrisation, we consider the IDEA detector concept of the FCC-ee project. This detector, as any detector project at any future circular electron-positron machine, is designed using recent technological advances to take advantage of the exceptionally large data samples due to be delivered by the forecast integrated luminosities. IDEA is planned to be constructed with a short drift wire chamber and calorimeter, and includes a low mass superconducting solenoid coil. The drift chamber will allow for high precision momentum measurements and good tracking capabilities, as well as excellent particle identification performance through cluster counting when combined with the dual readout calorimeter [3]. In particular, this aims at achieving a much improved impact parameter resolution over that of LEP, as well as a better momentum resolution and electromagnetic calorimeter resolution, and a finer electromagnetic calorimeter transverse granularity [40]. Precise measurements of charged objects properties at lower energies are therefore clearly achievable.
In building our analysis, we will moreover only consider the channel in which the pseudo-scalar a is produced in association with a pair of opposite-charge leptons. This allows us to avoid the difficulty in dealing with the multi-jet background still reasonably present at lepton colliders. Moreover, we will focus on the case where the pseudo-scalar decays into a pair of hadronic tau leptons for two reasons. First, the b-tagging performance is known to be quite poor in a regime yielding soft objects (originating from the decay of a light a boson into b-quarks). Next, hadronic tau decays account for approximately 2/3 of all tau decays, and future e + e − collider detectors are expected to have excellent handles on the associated decay vertices. This hence allows for a very efficient tau reconstruction. In addition, events with electrons or muons in the final state are expected to be reconstructed with a very good resolution [41]. Basing our estimations on previous detectors, we hence expect a typical systematic uncertainty on lepton identification of around 1%, together with a systematic uncertainty on hadronic tau identification of around 2-5% for taus with a transverse momentum p T > 20 GeV, and up to 15% otherwise [42].
In Fig. 6 we display the number of signal events expected from the production of the pseudo-scalar a in association with a pair of oppositely charged leptons, with a subsequent pseudo-scalar decay into a pair of hadronic tau leptons, We consider the entire mass range of interest and include statistical error bars. This illustrates the motivation to expect a significant number of new physics events in electron-positron collisions at the Z-pole. It moreover attests that the sensitivity of the machine will depend on the exact details of the model, as there is a significant range in the expected number of new physics events across the considered models. We now design our analysis from Monte Carlo simulations of both the signal and background processes, using MG5 aMC employed in conjunction with Pythia 8 [43] to describe parton showering and hadronisation. The IDEA detector response has been simulated by relying on the Delphes 3 [44] software package, that makes use of the anti-k T algorithm [45] as implemented in FastJet 3 [46] for event reconstruction. Both these last codes are driven through their interface with the MadAnalysis 5 platform [47,48], that we also use to carry on our phenomenological analysis. We begin with a cut-and-count analysis, aiming at unravelling the signal from the overwhelming backgrounds. However, as a consequence of the low statistical significance, we then employ a novel machine learning algorithm based on boosted decision trees in an attempt to improve the significance, using the XGBoost toolkit [49].
In identifying backgrounds to the signal process, we consider both processes which lead to a true τ τ final state, and those containing fakes. Background events of the first category feature prompt taus that are most likely to arise from the production of an + − pair in association with a pair of opposite-sign tau leptons through one or two (virtual) Z bosons and photons. Given the relatively low c.m. energy of 91.2 GeV, potential background events originating from two virtual weak bosons or top quark decays are not expected to contribute much. Background events of the second category arise from jets faking hadronic taus. They appear in processes such as vector boson production in association with a pair of jets, where the boson then decays to a pair of leptons, as well as in processes where the intermediate boson is virtual or the mediation occurs through virtual photons, e + e − → jj + − . (3.5) In the following, we refer to those background as Z+jets. The corresponding cross section is 6.65 × 10 −4 pb, that should then be multiplied by the appropriate fake rate factor. The IDEA detector parametrisation shipped with Delphes includes a 0.1% misidentification rate for jets faking hadronic taus and a tau identification efficiency of 60%. We therefore expect a fake contribution of the order of 10 −10 pb, which can thus safely be neglected.

A cut-and-count analysis
Our selection closely follows the pattern of the final state under consideration. We require events to contain at least two leptons (N ≥ 2), each with a minimum p T of 10 GeV to ensure good reconstruction, and at least two hadronic taus (N τ ≥ 2), each with p T > 5 GeV. We moreover enforce a minimum invariant mass m > 12 GeV on the lepton pair produced in association with the di-tau system, which is necessary to eliminate non-prompt leptons and avoid low mass hadronic resonances. Similarly, the invariant mass of the tau pair M τ τ is constrained to be at least 10 GeV. Given that the hadronic tau decay mode of the pseudo-scalar leads to neutrinos which carry away momentum, the di-tau invariant mass spectrum is expected to be soft and peak below the pseudo-scalar mass M a . It is therefore useful to maintain low momentum thresholds for the tau pair where possible. To summarise, our preselection imposes that N ≥ 2 with p T ( ) > 10 GeV; N τ ≥ 2 with p T (τ ) > 5 GeV; M > 12 GeV; M τ τ > 10 GeV.
(3.6)  After this preselection, we expect about 50,000 background events for signal event counts ranging from below 1 (M a < ∼ 10 GeV) or a few (M a > ∼ 50 GeV) to 10-40 (10 GeV < M a < 50 GeV). In order to reject the background while keeping a large signal efficiency, we investigate first the properties of the two final-state leptons. As in the case of the signal they are produced together with a low mass resonance, we expect the presence of potentially discriminating features in various kinematic distributions, the exact details behind those features being related to the resonance mass. We present, in Fig. 7, the angular separation between the two leptons ∆R( + , − ) (left) as well as the di-lepton invariant mass distribution M (right). We consider both the Z+jets background (red dashed) as well as five signal hypotheses from different models and pseudo-scalar masses below 60 GeV (i.e. our pseudo-scalar mass range of interest). More precisely, we have chosen a selection of five models (M2, M4, M7, M10 and M12) exhibiting a variety of hypercolour group and coset structures (see Tab. I). This allows us to largely explore the possibilities of the considered class of composite scenarios. The results depicted in the figures demonstrate that the ∆R( + , − ) and M spectra tend to peak at higher values for the background than for the illustrative signal hypotheses. This suggests two interesting cuts to isolate the pseudo-scalar signal, In addition, we also make use of the properties of the di-tau system to extract our composite signal from the background. In the signal case, the pair of hadronic tau leptons is issued from the decay of a resonance, so that its properties are expected to be largely different from the background case. In particular, the invariant mass of the di-tau system is expected to peak at a value just below the mass of the the pseudo-scalar, as a result of the presence   of the neutrinos originating from the tau decays and carrying away some momentum [50]. The resulting distribution is shown in the left panel of Fig. 8. As expected, the invariant mass of the di-tau system is shifted relative to the mass of the pseudo-scalar, and peaks just below the true value of the latter. The background distribution is, however, not so well differentiated from the signal one, as the di-tau system features predominantly a low invariant mass, as driven by the selection cuts of Eq. (3.7). We nevertheless define five signal regions, each of them being dedicated to one specific pseudo-scalar mass hypothesis, and respectively impose M τ τ < 10, 20, 30, 40, 50 GeV. (3.8) This allows us to eliminate some background in the heavier pseudo-scalar cases. The minimum requirement on M τ τ of Eq. (3.6) that protects us from the contamination of QCD resonances makes us, however, unable to get further handles on the signal for pseudo-scalar masses of 10 GeV or smaller. Similarly, we investigate the potential of the angular separation between the two taus (right panel of Fig. 8). Although shape differences are visible, they do not allow for a clear separation of the signal and the background. Any related cut will therefore be omitted from our analysis.
In Tab. II we present the expected sensitivity of our cut-and-count analysis in terms of standard deviations defined by an S/ √ S + B figure of merit, S and B respectively representing the number of selected signal and background events. We find that, given the relative rareness of the signal events amongst an abundance of background, it is difficult to obtain any hope to observe even a 1σ deviation from the background-only hypothesis across the entire mass range considered. It is, however, possible that this could be ameliorated by designing more appropriate and dedicated variables like the missing mass. As reflected in the M a -dependence of the cross section shown in Fig. 5, the significance is maximised at M a = 20 GeV. For larger pseudo-scalar masses, the steep fall-off of the cross section indeed reduces S to too large a level.

A machine-learning-based analysis
In order to improve the figure of merit of our analysis, we move on with considering a machine learning algorithm. We rely on the XGBoost toolkit [49] that allows for utilising gradient boosted tree methods [51] while offering fast training speed coupled with a good accuracy [52].
In general, a machine learning algorithm employing a tree ensemble uses a series of additive optimisations computed from a given set of variables to predict an output, i.e. in our case the classification of an event as a signal or a background event. At each stage of the training process, gradient boosting modifies the existing constraints in order to correct the classification errors made by the current best set of optimisations, continuing until no further improvement can be made in considering the residuals and errors of the prior stages. The XGBoost toolkit includes a novel algorithm geared towards the handling of sparse data, which is useful in our case as both signal and background events may not fully populate the event space.
The performance of the algorithm for a given set of optimisations can be evaluated by a quantity denoted as the area under the curve (auc). This corresponds to the integral of the receiver operating characteristic (roc) depicting the dependence of the signal purity of the events selected by the algorithm, S/(S + B), on the signal selection efficiency S/S 0 , where S 0 stands for the total number of signal events provided to the algorithm. The auc metric hence represents the degree of separability between background and signal. In addition, we use the approximate median discovery significance (ams) to estimate the sensitivity of the analysis to our signal. It is defined by [53] ams = 2 (S + B) ln 1 + where S and B can also be seen as the true and false positives respectively. While the ams provides the discovery potential of the analysis, its usage as an evaluation metric and learning objective is unstable and may lead to overfitting. The performance of the algorithm was therefore optimised using the auc quantity, following which the corresponding ams was calculated.
After applying the preselection of Eq. (3.6), we derive a set of uncorrelated kinematic variables to be used as input to our machine learning algorithm. They consist of a combination of primary variables (the tau and lepton transverse momenta, pseudo-rapidities and azimuthal angles) and derived variables (the di-tau invariant mass M τ τ and angular separation ∆R(τ, τ ), as well as the invariant mass of the τ τ system) that have been chosen such that their importance to the machine learning algorithm was maximised while removing any variables that were too strongly correlated with the others. The variables and their correlations are depicted in Fig. 9. The objective of the XGboost learning task was set to a logistic regression for binary classification. At each step (also known as splitting), the tree booster constructs new classifiers by combining and weighting the classifiers obtained at the previous step, the initial classifiers being the input variables. The hyperparameters that were found to affect the performance of the method were the learning rate, the maximum tree depth and the minimum child weight. The learning rate controls data over-fitting by varying the learning step size, the maximum depth of a tree indicates how many times a tree can split (hence controlling the algorithm complexity), and the minimum child weight controls the minimum weight that can be assigned when designing a new classifier.
In our analysis, 80% of the available Monte Carlo data was used for training purposes, and the remaining 20% for testing. For each model and M a value, we tuned the hyperparameters using a k-fold cross-validation, so that the choice maximising the auc was adopted. In particular, the maximum depth parameter was kept low and early stopping was employed in order to control over-fitting. It was found that a maximum depth of 3, a minimum child weight of 1 and a learning rate of 0.3 gave the most desirable result across the entire range of considered models. The auc metric and the corresponding significances obtained for a representative set of models are indicated in Tab. III.
The results indicated in Tab. III display a general improvement over the traditional cut-and-count method, but also an important variation across models and pseudo-scalar masses. In particular, the significance peaks at M a = 20 GeV for all models, as this corresponds to the maximum of the signal cross section (see Fig. 6). However, there are large differences in the trends across the models. For example, the performance for the model M10 quickly falls to one of the lowest for M a = 50 GeV. These behaviours reflect not only the varying production cross sections across the models but also the variations in the kinematics resulting from differing Lagrangian parameters. On the other hand, we find a low significance for M a = 10 GeV, where despite a relatively large cross section, the preselection cuts (and in particular the M τ τ requirement) rejects a large potion of the signal. The best performance of our analysis is found for scenarios featuring M a = 20, 30 GeV. For all models, the performance then drops off quickly for M a = 40, 50 GeV, and it falls more sharply than it does in the cut-and-count case. Such a drop in significance at these pseudo-scalar masses is expected, as the cross section decreases with M a . Moreover, some signal kinematic distributions exhibit important variations with the pseudo-scalar mass. An example can be taken from the ∆R(τ − , τ + ) spectrum (see the right panel of Fig. 8), where scenarios with higher a masses (M a = 40, 50 GeV) lead to very similar signal and background distribution shapes. Finally, we have trained the gradient boosting algorithm using one parameter choice across all considered masses, and the choice of kinematic variables on which to train the models was guided by a focus on the lower mass setups. This path has clearly optimised the 20/30 GeV scenarios, as they were expected to yield the highest significance by virtue of the larger associated cross sections. The potential price to pay could be a less efficient training for higher masses of a.
In comparing the significance trends of the gradient boosting results with those of the cut-and-count method, we observe the same ranking of performance among the different models, with the exception of the model M4. This model corresponds to the highest cross sections, and may therefore have been expected to be better performing. However, our framework leads to overfitting for the M a = 20, 30 GeV cases, which had to be carefully controlled by using an early stopping of the algorithm. This resulted in a lower significance.
In Tab. IV, we translate our results in terms of the luminosity that is needed in order to achieve a significance of 2σ (to preclude the existence of the new resonance) or 3σ (to claim evidence for the resonance) at a future electronpositron collider aiming at operating at the Z-pole. The table also shows the gain obtained by using the gradient boosting algorithm over a more traditional cut-and-count method. Very importantly, our findings show that for certain models, an achievable integrated luminosity would yield a 2σ or even 3σ significance. In all cases, larger pseudo-scalar masses remain likely out of reach at a c.m. energy of 91.2 GeV, as does the M a = 10 GeV case.

CONCLUSION
In this work we have designed an analysis targeting a light pseudo-scalar, ubiquitous to composite Higgs models, at a future electron-positron collider aiming at collecting a large luminosity at the Z-pole. In our predictions we have considered the pseudo-scalar couplings to gauge bosons to full leading order, i.e. by including relevant effects stemming from loops of b quarks. The latter have a significant impact for low mass pseudo-scalars, unlike what is traditionally assumed, and should be considered both at present and future hadron and lepton colliders.
We have demonstrated the possibility of actually getting hints for a low mass pseudo-scalar at a future lepton collider operating at a centre-of-mass energy of 91.2 GeV, focusing on the production mode in which the pseudoscalar is produced in association with a pair of electrons or muons and decays into a pair of hadronic taus. The corresponding Standard Model background has been found difficult to reduce via a standard cut-and-count analysis, which resulted in a poor sensitivity and a rare signal entirely hidden within the large background. In an attempt to improve these findings we have made use of a machine learning algorithm based on boosted decision trees. It yielded an improvement in sensitivity in almost all cases. In particular, we have observed a marked improvement for scenarios in which the pseudo-scalar mass M a = 20, 30 GeV, where the related significance approaches 3σ at an integrated luminosity of 150 ab −1 . Lighter configurations (M a < ∼ 10 GeV) are not promising, given that the signal is expected to be dominated by the background and mostly annihilated by any decent event preselection. The significance also  IV: Required luminosities, in ab −1 , to obtain a 2σ and 3σ significance to the pseudo-scalar signal at a future electronpositron collider operating at the Z-pole. We present results for our cut-and-count (third and fourth columns) and gradient boosting (fifth and sixth columns) methods, for an illustrative selection of models.
drops off for higher pseudo-scalar masses, by virtue of a decreasing signal cross section and key kinematic properties becoming very similar to the background ones. It is, however, possible that this could be ameliorated by designing more appropriate and dedicated variables. From our findings, we demonstrated that a direct search for a light composite pseudo-scalar at high integrated luminosity lepton colliders should be seriously considered. While our generic analysis covers the parameter space region in which the mass of the pseudo-scalar is less than 60 GeV, it is certainly less sensitive to M a values of 40 GeV or more. Future works should determine whether it could be optimised for these heavier configurations, perhaps by considering future lepton colliders operating at higher centre-of-mass energies. Among the avenues to be explored, one could benefit from a gain in sensitivity by relying on the spin-0 nature of the pseudo-scalar and assessing the potential of various angular distributions between pairs of final-state objects. For the same reason, it may also be useful to make use of tau polarisation in order to separate signal from the background [54]. Finally, other options may rely on the presence of a second heavier pseudo-scalar η , that is common to many composite models.