Searching for Flavor-Violating ALPs in Higgs Decays

Pseudo-scalar particles, often referred to as axion-like-particles (ALPs), arise in a variety of theoretical contexts. The phenomenology of such states is typically studied assuming flavor-conserving interactions, yet they can in principle have flavor-violating (FV) couplings to fermions. We consider this general possibility, focusing on models where the ALP has non-negligible coupling to the Standard Model Higgs boson $h$. For a lepton FV ALP $a$ of mass $m_a \gtrsim 2$ GeV, $a\to \tau \ell$, where $\ell\neq \tau$ is a charged lepton, could have $\mathcal{O}(1)$ branching fraction, leading to potentially detectable $h \to a a \to \tau \ell \tau \ell$ at the LHC and its future program. We examine this possibility, in light of existing bounds on FV processes, in a general effective theory. We obtain constraints on the effective couplings from both prompt and long-lifetime searches at the LHC; some projections for envisioned measurements are also provided. The implications of the recently announced first results of the muon $g-2$ measurement at Fermilab for the ALP interactions considered in our work are also briefly discussed.


I. INTRODUCTION
The discovery of the Higgs boson has established the Standard Model (SM) as a potentially complete description of the visible sector of the Universe. However, robust experimental data and numerous observations have led to the firm conclusion that ingredients beyond the SM are required to explain neutrino flavor oscillations -and hence tiny neutrino masses -as well as the gravitationally observed, but otherwise unknown substancei.e. dark matter (DM) -which is the dominant form of cosmic matter. Currently, there are numerous proposals to explain these and some other open questions of particle physics and cosmology. In particular, the possibility that the new physics resides in a separate sector and may have a rich spectrum of states and non-trivial dynamics remains a viable option.
It is then interesting to ask what the potentially promising signals for those new phenomena are.
A possibility that arises in various physical contexts is the appearance of pseudo-scalar particles, associated with broken global symmetries. In the SM, spontaneous chiral symmetry breaking in the QCD sector gives rise to pions, which are the corresponding pseudo-Nambu-Goldstone bosons (PNGBs). In general, many models of new physics contain such pseudo-scalars, often called axion-like particles (ALPs). A well-known example is the QCD axion of the Peccei-Quinn mechanism [1,2], proposed to address why strong interactions seem to entail negligible, if any, CP violation. Given their relative ubiquity in new physics models, ALPs are well-motivated targets for phenomenology. We will denote a generic ALP by a, in what follows.
In this work, we entertain the possibility that the ALP couples to ordinary leptons with significant flavor-violating couplings. Such interactions can naturally appear if the newphysics sector containing ALPs is linked to lepton flavor violation, including generation of neutrino masses and mixing. The possibility of flavor-violating ALP-lepton couplings has been studied before -see for example Refs. [3][4][5][6][7] -but we will focus in particular on collider signals which occur when the ALP also couples to the SM Higgs boson h. This opens the possibility of decays of the form h → aa → (ℓℓ ′ )(ℓℓ ′ ) where ℓ ̸ = ℓ ′ . This is a distinctive final state with very low SM background.
Flavor-violating decays of the Higgs boson of this form have been studied previously in [8], but in that work the intermediate particles were taken to be scalar, rather than pseudoscalar. With pseudoscalars, the parameter space of constraints will be somewhat different. In particular, pseudoscalar decay widths are proportional to the mass squared of the final-state particles, so decays of the form a → τ ℓ will be generally dominant for m a ≳ 2 GeV. We will study the prospects for detection of this ALP signal at the LHC, as well as envisioned experiments.
In particular, we will focus on the parameter range 2 GeV ≲ m a ≲ m h /2 where the signal of interest will be strongest. Other constraints, including constraints from astrophysical observations like SN1987a and beam-dump experiments, are not generally applicable in this range [9]. Constraints from direct searches for lepton flavor violating (LFV) processes such as τ → 3 µ are also weaker in this parameter range [4], allowing for stronger lepton-ALP couplings, perhaps even sufficient to explain the anomalies in the electron and muon magnetic moments as studied in [5]. As such, we will examine flavor conserving as well as lepton flavor violating couplings necessary to explain the muon g − 2 anomaly in our model, taking into account existing limits on LFV interactions.
For our phenomenological study, we will adopt an effective field theory (EFT) approach, valid at or below the weak scale. Such an EFT is assumed to descend from an ultraviolet (UV) complete theory at higher scales. By using the EFT framework, we obtain results in terms of the size of the coefficients of various operators, rendering our analysis largely modelindependent. These results can then be recast for specific models in terms of their relevant mass scales and couplings. We will consider signatures from both prompt and displaced decays of the ALP at the LHC.
Particles with long lifetimes necessarily have a small decay width, which corresponds to very small couplings of the particle to its decay products. This can arise naturally for ALPs, since the interaction terms in the Lagrangian are typically suppressed by some higher energy scale. As a result, displaced decays may be the best way to search for ALPs from beyond the SM scenarios. If the particles are long-lived enough, they may escape the LHC long before decaying, so proposals for future detectors such as such as MATHUSLA [10][11][12][13], CODEX-b [14], and ANUBIS [15] may be the best way to look for them. The combination of long-lived ALP states with flavor-violating lepton couplings is motivated by a specific UV completion [16] in which a dark sector gives rise to both neutrino masses and dark matter.
The paper is organized as follows. In Section II, we describe the effective theory to be used in our collider studies. Section III considers bounds on the parameter space of the effective theory due to LFV constraints from lepton conversion, and Section IV considers exclusions and projected exclusions on the Higgs branching fraction into ALPs for prompt and displaced decays. In Section V, we examine the region of our parameter space necessary to explain the muon g − 2 anomaly, finding that in such a scenario LFV couplings of the ALP must be greatly suppressed compared to lepton-flavor conserving (LFC) couplings.

II. EFFECTIVE THEORY
There has been substantial work in the literature on constructing a low-energy effective theory for ALPs coupled to the SM [4,6,7,9,[17][18][19][20][21][22][23][24]. Here, we focus on a subset of all possible couplings which are most relevant for the signal of interest. We will adopt the following simplified effective Lagrangian in the electroweak-broken phase: where We assume that the other neglected couplings, e.g. direct coupling of the ALP to light quarks, are negligible. We include only parity-conserving couplings, neglecting the possibility of the lepton vector current coupling to the ALP. In our main phenomenological study, we will assume that all of the leptonic couplings C ℓℓ ′ , both flavor-diagonal and off-diagonal, are roughly equal in magnitude. The possibility of a direct coupling of a to neutrinos is also omitted here, see Ref. [22] for a discussion of including such a term; in the present case such a coupling would not have any direct impact on the phenomenology, as the decay a → νν is proportional to m 2 ν or induced by a high-dimension operator and therefore extremely suppressed [20].
We will consider the scenario where the tree-level coupling C γγ = 0, but we note that such a coupling can still be generated at loop order. The most general decay width for a → γγ is given by [20] where the coupling C eff γγ is (Note that our definition of C f f differs by a factor of 2 from c f f in the reference.) Here, It follows that Assuming that only the lepton couplings are non-negligible gives Q 2 f = 1 and N f c = 1. Hence, there will still be some branching fraction of a into photons even with C γγ = 0. In the mass range of interest, this branching fraction is small compared to those of leptonic decay modes, as we will now demonstrate.
The decay width of the ALP into leptons ℓ,ℓ ′ is given by assuming that m ℓ ≪ m ℓ ′ on the last line. Note that the decay width is roughly proportional to the square of the heavier daughter-lepton mass, so that even if there is no hierarchy among the couplings |C ℓℓ ′ |, decays involving τ leptons will generally be dominant. This is one of the key differences between ALP and scalar decays into pairs of fermions. Whereas scalars can couple to fermions with arbitrary couplings, the origin of ALPs as Goldstone bosons mandates that they interact through derivatives (by shift-symmetry) as in Eq. (2), which leads to couplings proportional to masses using the equations of motion.
If the photon coupling is only generated at loop order, then for m a ≳ m τ , we have where we have used C eff γγ = O 1 8π 2 C τ ℓ ≈ 0.01C τ ℓ . It follows that the photon branching fraction only becomes important when m a ≳ 1200 GeV, far outside the range of masses considered. Although this argument is only technically valid for m a ≫ 2 GeV, a numerical check shows that B(a → γγ) ∼ 10 −7 even for m a = 2 GeV, because the subdominant leptonic modes such as a → µµ still have a much larger width than the width into photons. Hence, to very good approximation the ALP in our model decays only through leptonic modes, and mostly to modes with at least one τ , so that is what we will focus on for the remainder of the study.
As previously noted, in the following we restrict the model parameter space to This ensures that the key process of interest h → aa → (ℓℓ ′ )(ℓℓ ′ ) can proceed fully on-shell to all three lepton generations. We will also primarily consider the case that the leptonic couplings are universal (e.g., C ℓℓ ′ ≈ const.), and we hereby refer to this single constant as C LL . Additional phenomenology can be done by considering various hierarchies of the different couplings, and we consider such a scenario in more detail in Section V. Due to the natural hierarchy which arises from the lepton mass dependence of the ALP-lepton decays, one would need to introduce an opposing hierarchy in the couplings C ℓℓ ′ in order to significantly alter our phenomenological results.

III. CONSTRAINTS ON FLAVOR-VIOLATING COUPLINGS FROM LEPTON CONVERSION
Before we proceed to collider phenomenology of Higgs decays, we first consider existing experimental constraints on the flavor-violating lepton couplings in our model. Such constraints have been studied previously for ALPs in Ref. [4]. In the region of interest m a ≳ m τ that we have identified, the single strongest constraint on the couplings C ℓℓ ′ currently comes from searches for lepton conversion of the form ℓ → ℓ ′ γ. We will thus focus on this particular signal as a constraint. Future improvements in searches for decays of the form µ → 3e, µ → eγγ and τ → 3(e, µ) may lead to relevant constraints even for m a ≳ m τ ; see Ref. [4] for details.  [25] and (τ → ℓγ) [26]. The most stringent constraint is given by the process µ → eγ, so this is the LFV constraint we will adopt for the rest of our analysis.
From Ref. [4], the decay rate for ℓ → ℓ ′ γ is given by where the form factor F ℓℓ ′ 2 (0) is defined in Appendix A. We use C eff γγ for the photon coupling in Eq. (A1). To be more precise, one should perform the explicit two-loop calculation to consider the effect of the loop-induced photon coupling, since the ALP and a photon in the loop can be off-shell, but this is beyond the scope of our analysis.
Under the previously stated assumption that Hence, constraints on the decay rate can be cast to constraints on |C LL |/Λ via Constraints on Γ(µ → eγ) are obtained from the MEG [25] experiment, whereas Γ(τ → ℓγ) constraints are obtained from BaBar [26]. In particular, at 90% CL, The resulting exclusion plots for the mass range of interest (2 GeV ≲ m a ≲ m h /2) are shown in Fig. 1. The most stringent constraint comes from the bound on Γ(µ → eγ).

IV. HIGGS DECAY SIGNAL
Due to the coupling of the ALP to leptons, one might expect that limits could be obtained from electron-positron colliders such as LEP and CESR, since such a particle could be produced via e + e − → a. However, ALP production from e + e − is severely suppressed by m 2 e /Λ 2 , due to the derivative coupling of the ALP. Since the ALP considered also does not couple to quarks, we do not expect there to be direct production from hadron colliders such as the LHC. Hence, we expect that Higgs decays are the best way to look for such a particle.
For most of the mass region considered, over 99% of ALPs decay via a → τ ℓ, so the best constraints will come from the decay mode h → aa → (τ ℓ)(τ ℓ ′ ) .
The ALP coupling C ℓℓ ′ can also generate decays of the form Z → (ℓℓ ′ )a → (ℓℓ ′ )(ℓℓ ′ ), but as was shown in Ref. [8], the limits from Z decays are much less significant than those arising from Higgs decays (this was shown for a scalar boson in the reference, but we determined the differences for a pseudoscalar are minor, with corrections proportional to m ℓ m Z .) For an ALP that only couples to the leptons through a derivative, the Z decay is also even further suppressed by a factor of the lepton mass. Therefore, we will safely ignore that process for the remainder of this paper.
The Higgs decay rate into ALPs is given by: Defining the quantities and this decay rate can be written as It will be interesting to place constraints on the parameters C ah and C ′ ah when given constraints on the branching fraction of a certain process B(h → 2a → X). This branching fraction can be written as which is a monotonic function in |C ah | 2 . Hence, if there is a constraint B ≤ B max , this translates to a constraint From here, an allowed region can be found for C ah in terms of C ′ ah or vice versa. In particular, Hence, it is most natural to place a direct constraint on C ah , and then translate the result to specific values of C ah or C ′ ah as desired.

A. Constraints from existing searches
One feature of our model which makes it difficult to constrain is the dependence of the ALP coupling on the masses of the final-state leptons. Since m τ ≫ m e , m µ , the ALP will decay with at least one τ in the final state O( 8m 2 τ 8m 2 τ +6m 2 µ ) ≈ 99.7% of the time. Hence, any study which attempts to reconstruct pairs of electrons and muons is not very constraining for our model, since there is likely to be missing energy associated with such pairs.
Here, we present constraints based on a search in Ref. [27] for proton-proton collisions which result in 4 lepton final states. In particular, we are interested in the "OSSF0" signal, which corresponds to the events in which the number of opposite-sign, same-flavor (OSSF) lepton pairs is zero. The dominant modes that would contribute to this signal in our model are still those which contain at least one τ , which then decays into an electron or muon.
To this end, we simulate Higgs production via gluon-gluon fusion, with the Higgs decaying Here, τ ℓ indicates a τ lepton which decays as τ → ℓνν. We denote these processes collectively as h → aa → OSSF0. The model is created using FeynRules [28], and the parton-level events are created in MadGraph5_aMC@NLO [29]. We then use Pythia8 [30] for showering, and simulate the CMS detector with Delphes [31]. We produce 50k events of the form gg → h, h → aa → OSSF0 for the masses m a = 2 GeV to m a = 3.6 GeV with a step of 0.2 GeV, and m a = 5 GeV to m a = 60 GeV with a step of 5 GeV. We additionally consider m a = 62.5 GeV for a total of 22 different masses. The number of events recovered from the initial 50k based on the analysis in Ref. [27] and the corresponding (efficiency) · (acceptance) is shown in Fig. 2. In the analysis of Ref. [27], the number of events with OSSF0 pairs predicted and observed at a luminosity of L = 137 fb −1 is 7. Following the analysis from Ref. [32], the 95% confidence interval on the mean number of signal events when the number Ref. [27], and the dashed line is the projection for L = 3000 fb −1 . These limits assume the decay of the a is prompt, and will be weakened for long-lived ALPs, as shown in Fig. 4 below.
of background events and number of observed events are both 7 is (0, 6.81). A 95% CL upper-limit on the branching fraction is then derived via ϵσ ggh LB(h → aa → OSSF0) ≤ N max = 6.81 (26) where ϵ = (efficiency) · (acceptance). This constraint is then recast to a constraint on C ah /Λ 2 from Eq. (24) by using the theoretical branching fraction for B(h → aa → OSSF0) and interpolating ϵ from Fig. 2 to all values of mass. The results of this constraint for L = 137 fb −1 and a projected value of L = 3000 fb −1 are shown in Fig. 3. For the projection, we assume that the number of observed events is the same as the number of expected background events, which would be (3000/137) × 7 ≈ 153. Again following the analysis from Ref. [32], this corresponds to a 95% confidence interval on the mean number of signal events of (0, 25.9).

B. Limits from displaced decays
Since the branching fraction of the ALP into leptonic final states is nearly 100%, it is difficult to place a constraint on the parameters |C ℓℓ ′ | from prompt decays alone. For a certain regime of lepton couplings, the decay of the ALP will be displaced, and will thereby be subject to experiments which search for long-lived particles. Such a scenario is more natural in the ALP case than the scalar case, since there is an additional suppression of O(m 2 τ /Λ 2 ) of the ALP decay rate, which translates to a ∼ Λ 2 /m 2 τ enhancement of the ALP lifetime. In our model, the lifetime τ a of the ALP is given by where we have once again made the assumption that the couplings are roughly universal and given by a value C ℓℓ ′ = C LL , and defined Γ 0,ℓℓ ′ ≡ Γ(a → ℓℓ ′ )/|C LL | 2 .
In the case of a long-lived ALP, the constraints in the previous section have to be modified to accommodate for the fact that the ALP may leave the detector. A formula for the fraction of pair-produced long-lived ALPs which decay in a cylindrical detector is derived in Ref. [20] to be where L det is the radius of the detector, θ is the angle of the ALP pair w.r.t. the beam axis in the rest frame of the Higgs, and L a ≡ (β a γ a )cτ a is the decay length of the ALP in the lab frame. Here c is the speed of light and we take γ a = m h /2m a . Taking L det = 1.1 meters for the inner detector of CMS, we can recast the constraints from the previous section for a long lifetime by multiplying ϵ by the fraction f aa . These results are demonstrated in Fig. 4.
In addition to recasting prompt constraints, we can also examine constraints from searches for long-lived particles at CERN. One difficulty with such a task is that the main decay mode of the ALP has a very high hadronic background (for example, from B-mesons) for displaced decays in the ATLAS or CMS inner detectors. As a result, we expect displaced decay analyses to become more important and fruitful when the ALP can travel into the muon spectrometer. This corresponds to τ a ≳ 1 m or |C LL |/Λ ≲ 10 −5 TeV −1 for m a = 10 GeV.
One work that considers such a scenario is Ref. [33], which places constraints on the Higgs decaying into long-lived particles which have displaced jets in the final state by combining data from the ATLAS inner detector with data from the muon spectrometer. In particular, they consider the scenario in which one displaced jet is found in the inner detector and the other is found in the muon spectrometer. Since most of the ALPs in our model decay with at least one τ , and most τ s decay hadronically, this study can be used to constrain our model. To adapt the work for our purposes, we digitize their results for a 95% CL upper-limit on the branching fraction for Higgs into two scalars, then use a piecewise polynomial fit to generalize it to a variety of masses and lifetimes. The plots we digitized cover an extensive region of lifetimes, but are only presented for four masses: m a = 10, 25, 40, and 55 GeV.
Due to this limited data along the mass axis, we account for overfitting by interpolating plots for m a = 15, 32.5, and 47.5 GeV using the geometric mean of the branching ratio limits corresponding to the nearest two masses, and then use these values along with the original masses to ensure that the fit function has a relatively smooth dependence along the mass axis. The fit function agrees well with the data in regions of overlap, but we stress that results are likely less accurate for m a ≲ 10 GeV and m a ≳ 55 GeV. For more accurate data in these mass regimes, the analysis of Ref. [33] would need to be repeated. The details of the fitting procedure can be found in the Github repository Ref. [34]. Once again, these branching fraction constraints can be recast to constraints on C ah /Λ 2 . Using Eq. (27), we can represent the constraints in terms of |C LL | to create an exclusion for C ah /Λ 2 and C LL /Λ. and [33]. The lifetime constraints from displaced jets [33] are the solid regions, while the prompt constraints from the OSSF0 signal [27] are the transparent regions. Note that although in the model we assume a universal lepton coupling C LL , the exclusion here is essentially on the subset of the couplings C τ ℓ .
|C ah |/Λ 2 ≲ 0.1 TeV −1 or C LL /Λ ≲ 10 −6 TeV −1 . Assuming C ah and C LL are O(1), such a model would still be completely unconstrained for Λ ≳ 3 TeV. At a projected luminosity of L = 3000 fb −1 , the C ah constraints can be improved by a factor of O(5-10), but C LL will still remain relatively unconstrained: for a detector the size of ATLAS (O(10 m)), any particle with lifetime cτ ≳ 10 m will be unlikely to decay inside the detector a significant portion of the time. For m a = 10 GeV, this corresponds to C LL /Λ ∼ 4 × 10 −6 TeV. Hence, regardless of how strong constraints on C ah can get, any ALP with cτ ≳ 10 m will evade searches in current colliders.
Many BSM scenarios propose the existence of long-lived particles whose average decay lengths lie well beyond the reaches of current particle detectors. As a result, there have recently been proposals for detectors to search for such long-lived particles such as MATH-USLA [10][11][12][13], CODEX-b [14], and ANUBIS [15]. Here we focus on projected constraints for MATHUSLA from Ref. [12]. In the reference, the MATHUSLA detector is taken to be located on the surface of the Earth (∼ 100 meters above ATLAS and 100 meters down the beam pipe, with dimensions 100 × 100 × 20 m 3 ). An estimate is provided in Ref. [11] for the projected constraint on the Higgs cross section to long-lived particles: where b = 2m a /m h , (L 1 , L 2 ) = (200 m, 230 m), ϵ geom = 0.05, and n LLP is the number of longlived particles in the Higgs decay channel. We found that this approximation works relatively well for the projected exclusions of B(h → aa) from Ref. [12] at long lifetimes, provided we take (L 1 , L 2 ) = (180 m, 200 m) (due to the smaller detector size). At lower lifetimes, the approximation fails due to more complicated dependence on the detector acceptance and efficiency. Hence, for low lifetimes, we fit the B(h → aa) projected exclusions from Ref. [12] with a polynomial fit, with the condition that it matches Eq. (29) for cτ ≳ 10 m. Details of the fit can once again be found on the Github repository Ref. [34]. Our projected constraints on the ALP couplings are shown in Fig. 6; we find tha MATHUSLA can effectively constrain the model for much lower values of |C LL |.

C. Combined Results
Here we present the combined results from LFV considerations, prompt decays and displaced decays. To do so, we present exclusion plots on the average lepton coupling |C LL |/Λ for different Higgs couplings. Previous constraints were placed on C ah (see Eq. (19)), so we consider two extremes: C ah = 0 and C ′ ah = 0. Both terms C ah and C ′ ah show up for, e.g., the pion in the Standard Model: C ′ ah arises from Yukawa couplings with the lighter quarks, whereas C ah arises through a Higgs-gluon effective coupling due to heavy quark loops [35,36]. Since the effect of the C ′ ah coupling is suppressed by a factor of m 2 a /m 2 h , the most relevant Higgs coupling is C ah for light ALP masses. However, in scenarios where one has a heavy composite ALP in a theory with only light quarks, such as the model in Ref. [16], ah is the only significant ALP-Higgs coupling. All constraints are presented at the 95% CL, except for the MEG LFV constraint, which is at the 90% CL.
The resulting plots are demonstrated in Figs. 7 and 8. A few observations are in order: • Although these constraints are placed on C LL assuming universal on-and off-diagonal lepton couplings, the majority of the exclusions (barring the LFV constraints) mainly constrain C τ ℓ . This is because the rely on ALP decays, and the ALP decays into at least one τ a vast majority of the time. Hence, one could consider a scenario with larger C eµ coupling for phenomenological purposes without violating these constraints.
• In both scenarios, the model is unconstrained for C LL /Λ ≲ 10 −6 TeV −1 , regardless of C (′) ah . As a result, there is a lot of unexplored potential for long-lived ALPs, which could potentially be discovered or ruled out by proposed detectors like MATHUSLA.
• For low enough C ah (C ah /Λ 2 ≲ 0.1 TeV −2 , C ′ ah /Λ 2 ≲ 1 TeV −2 ), the only significant constraints on the ALP-lepton coupling come from LFV considerations. Such constraints can be improved either through increased detector luminosities, or other searches which exploit the lepton flavor violating decays of the ALP. searches and displaced vertex searches, with C ′ ah = 0. As C ah is decreased, the allowed region of ALP-lepton couplings increases dramatically.

V. ANOMALOUS MAGNETIC MOMENT OF THE MUON
The recent Fermilab measurement [37] of the muon anomalous magnetic moment a µ = (g µ − 2)/2 has confirmed the long-standing Brookhaven result [38]. The combined value of a µ from the Fermilab and Brookhaven experiments is at a 4.2σ disagreement with the searches and displaced vertex searches, with C ah = 0. As C ′ ah is decreased, the allowed region of ALP-lepton couplings once again increases dramatically.
SM predictions as summarized recently by the Muon g − 2 Theory Initiative [39], with ∆a µ = (25.1 ± 5.9) × 10 −10 . In this section, we explore the region of our parameter space which can explain the muon g − 2 results while still adhering to the LFV constraints from Section III. The region of parameter space necessary enforces an additional hierarchy of diagonal couplings C ℓℓ to off diagonal couplings C ℓℓ ′ , so we recast the constraints accordingly.
The contribution of the anomalous magnetic moment from the LFC and LFV couplings in our model (treating C γγ = 0) is [4,5,24], where x ℓ = m 2 a /m 2 ℓ and the loop functions h i (x) and g 3 (x) are given in Appendix A, and the functions H and h are two-loop functions from consideration of the loop-induced ALPphoton coupling, defined in Ref. [24]. We remind the reader that we have included only parity-conserving couplings in our model; couplings between the axion and lepton vector currents can give additional contributions to ∆a µ , see Refs. [4,5] The possibility of explaining the anomaly solely with LFV couplings (Eq. (32)) is explored in Ref. [5], but without parity-violating couplings much lighter ALP masses than we consider here are required. Explanation of the anomaly with LFV couplings is also constrained by muonium-antimuonium oscillation measurements and will be further tested by future measurements from Belle-II [40,41]. The anomaly was also addressed, using larger ALP masses, more recently in Ref. [24], but in their scenario the only non-zero ALP-lepton coupling is C µµ . Here, we consider the scenario in which there exists some hierarchy between the flavor on-and off-diagonal couplings, and negligible tree-level photon coupling, so that the ALP-photon contribution to ∆a µ is induced via lepton loops.
Both (∆a µ ) LFC and (∆a µ ) LFV can have the right sign to explain ∆a µ under certain conditions. (∆a µ ) LFC > 0 if the first term is negative and greater in magnitude than the second, which is generally true if C µµ < 0 and C ee , C τ τ > 0. Alternatively, (∆a µ ) LFV > 0 if |C eµ | ≳ (m τ /m µ ) 3/2 |C µτ | ∼ 70|C µτ |. Although both of these are technically possibilities, it turns out that in order to adhere to LFV constraints while explaining the ∆a µ anomaly, we find that the only viable solutions have |C ℓℓ ′ | ≪ |C ℓℓ |, so the LFC contribution will be much larger than the LFV contribution. We assume C µµ < 0 and C ee = C τ τ = −C µµ , and we impose a hierarchy |C ℓℓ | = κ|C ℓℓ ′ |. The dominant contribution to the muon g − 2 anomaly with these assumptions is from the Barr-Zee diagram [42,43]. Through very general considerations, we find that we must have κ ≳ 5 × 10 4 GeV; otherwise, |C µe | is large enough to violate the constraints from MEG (see Section III). A plot of the minimum hierarchy κ necessary to explain the muon g − 2 discrepancy while adhering to these LFV constraints is shown in Fig. 9. Note that all values above the minimum value κ are allowed, including κ → ∞, so that the off-diagonal coupling is taken to 0, but this minimum value is dependent on the true value of ∆a µ . This is indicated in the Fig. 9 with upward red arrows.
It is interesting to consider the effect that such a hierarchy has on our phenomenology as well. The main difference in the phenomenology is that previous works on pseudoscalar decays of the Higgs with diagonal lepton couplings such as [44,45] become more relevant.
We find that for some range of the masses considered in [45] (4 GeV ≤ m a ≤ 15 GeV), their study outperforms the OSSF0 analysis from [27] for the hierarchical case. Although the OSSF0 search outperforms the h → 2a → 2µ2τ search for most masses considered, the latter was conducted with less than a third the luminosity (39 fb −1 vs. 137 fb −1 ). With increased luminosity one can certainly expect better results, although we do not attempt to estimate such a projection here.
Explicitly solving for the C ℓℓ required to explain ∆a µ in Eq. 31 (again, assuming C ee = −C µµ = C τ τ > 0) yields |C ℓℓ |/Λ ≈ 15 TeV −1 over the range of masses considered. As a result of this large flavor-conserving lepton coupling, the value of |C eff γγ | is necessarily much larger in this scenario (see Eq. (6)), so we must also reconsider experimental constraints in FIG. 10: Constraints on C ah /Λ assuming a coupling hierarchy such that |C ℓℓ ′ | ≪ |C ℓℓ | (in this figure we have taken κ = 5 × 10 4 , but it would look identical for larger κ). A large hierarchy like this is required to explain the ∆a µ discrepancy without violating existing LFV constraints from MEG (see Fig. 9). The main difference between these bounds and the bounds in Fig. 3 is the hierarchy between on-and off-diagonal couplings, greatly suppressing the a → τ µ and a → τ e channels.
which the ALP-photon coupling is probed directly. As summarized in Ref. [46], the strongest current constraint for m a ≳ m τ comes from ATLAS [47] and CMS [48] searches for light-bylight scattering in Pb-Pb collisions. Using the narrow-width approximation, the predicted cross-section in our model is where the function I(m 2 a ) captures the energy dependence of the photon-photon luminosity [49,50]. In the current scenario where the coupling C eff γγ arises solely from lepton loops and the ALP satisfies m a > m τ , the decay modes of a are still dominantly into pairs of leptons and the constraint from Pb-Pb collisions is not significant.

VI. CONCLUSIONS
The central focus of this paper has been on the decay of the Higgs into two pairs of off-diagonal leptons, through ALP mediators. Multiple theories have suggested the presence of a heavy axion-like particle leading to new physics, but aspects of the combination of the Higgs decay and lepton flavor violation were yet to be fully explored. Useful constraints on large regions of the ALP mass and coupling constant parameter space can be obtained from various collider searches for both prompt and long-lived particles.
Initially working under the assumption that the coupling of the ALP to leptons C ℓℓ ′ was universal across all combinations of ℓ and ℓ ′ , formulae were derived for the first-order decay rates of the Higgs into a pair of ALPs and the ALP into a pair of leptons. The process h → aa → OSSF0 was then simulated with MadGraph which, when combined with cuts from ATLAS search [27], made it possible to immediately place bounds on C ah and m a (Fig. 3). There is a possibility that other future searches can lead to better constraints by taking advantage of the LFV decay modes. In particular, Ref. [45] considers constraints on the branching fraction of h → 2a → 2µ2τ and h → 2a → 4τ by examining the invariant mass distribution of a candidates. The addition of O(1) LFV couplings could lead to many more events in the h → 2a → 2µ2τ mode via a → τ µ, which would likely affect the invariant mass distribution. It would be interesting to see if constraints produced in this manner would outperform those obtained from the OSSF0 signal, especially for larger masses, where the constraints from [45] appear to drop off.
These bounds were combined with others from existing investigations into lepton conversion processes such as µ → eγ and τ → ℓγ, by adapting the formulas for the rates of those processes from Ref. [4] to our C ℓℓ ′ and C eff γγ and plotting the resulting constraints (Fig. 8). Of these, the µ → eγ bound obtained from the data of the MEG experiment was found to be the most powerful.
Next, we investigated the possibility that the ALP vertices from the Higgs decay process might be displaced. After charting where in the parameter space such events would be allowed by the dimensions of the detector (Fig. 4), we derived using data from Ref. [33] further bounds on the relationship between C ah and C ℓℓ ′ (Fig. 5). These bounds successfully constrain lower values of the parameters than the prompt decays, by a factor of about 3.
The model is still unconstrained however, for |C ah |/Λ 2 ≲ 0.1 TeV −2 , C ℓℓ ′ /Λ ≲ 10 −6 TeV −1 , or for ALPs with lifetimes exceeding around 10 m. In light of the limitation on the lifetimes in particular, the advancements to these constraints that a larger detector like MATHUSLA could produce were examined. The projected bounds from MATHUSLA were added to our displaced vertex plots, where it was found that the new detector could make the constraints multiple orders of magnitude more stringent in some cases (Fig. 6).
Finally, we investigated adapting our model to explain the recent results for the muon g − 2 anomaly. Because of the ALP's large mass and because it is a pseudoscalar, the largest contribution to the process must come from the Barr-Zee diagram for the sign of the difference to be correct. If one allows for a hierarchy of at least four orders of magnitude between flavor diagonal and off-diagonal values of C ℓℓ ′ , the desired contribution can be achieved. This is encouraging, as it implies that with some adjustment, the OSSF0 signal could be another way to search for a deeper explanation for this anomaly.
Our analysis so far has been restricted to m a ≳ 2 GeV, so that decay modes involving at least one τ lepton are dominant due to the mass scaling of the leptonic ALP decay rate.
For ALPs with masses below the τ mass, the only possible on-shell decay mode would be a → µe, which could lead to the signal h → (µe)(µe). It would be interesting to study a dedicated search in this channel in future work; the signal with displaced decays in particular could be very distinctive. Study of the more general parity-violating case where couplings of the ALP to the lepton vector currents are nonzero would also be an interesting extension of this work.