Coupling QCD-scale axion-like particles to gluons

We present a novel data-driven method for determining the hadronic interaction strengths of axion-like particles (ALPs) with QCD-scale masses. Using our method, it is possible to calculate the hadronic production and decay rates of ALPs, along with many of the largest ALP decay rate to exclusive final states. To illustrate the impact on QCD-scale ALP phenomenology, we consider the scenario where the ALP-gluon coupling is dominant over the ALP coupling to photons, electroweak bosons, and all fermions for $m_{\pi} \lesssim m_a \lesssim 3$ GeV. We emphasize, however, that our method can easily be generalized to any set of ALP couplings to SM particles. Finally, using the approach developed here, we provide calculations for the branching fractions of $\eta_c \to VV$ decays, i.e. $\eta_c$ decays into two vector mesons, which are consistent with the known experimental values.

We present a novel data-driven method for determining the hadronic interaction strengths of axion-like particles (ALPs) with QCD-scale masses. Using our method, it is possible to calculate the hadronic production and decay rates of ALPs, along with many of the largest ALP decay rates to exclusive final states. To illustrate the impact on QCD-scale ALP phenomenology, we consider the scenario where the ALP-gluon coupling is dominant over the ALP coupling to photons, electroweak bosons, and all fermions for mπ ma 3 GeV. We emphasize, however, that our method can easily be generalized to any set of ALP couplings to SM particles. Finally, using the approach developed here, we provide calculations for the branching fractions of ηc → V V decays, i.e. ηc decays into two vector mesons, which are consistent with the known experimental values.
Axion-like particles (ALPs) are hypothetical pseudoscalars whose couplings to the gauge bosons of the Standard Model (SM)-the gluons, photons, and electroweak bosons-are highly suppressed at low energies by a large cut-off scale Λ. ALPs are found in many proposed extensions to the SM (see Refs. [1][2][3][4]), since they naturally address such puzzles as the Strong CP [5][6][7][8] and Hierarchy problems [9]. Moreover, ALPs may explain the long-standing anomaly with the magnetic moment of the muon [10], and could provide a portal connecting SM particles to dark matter [11][12][13][14].
ALPs are pseudo-Nambu-Goldstone bosons, and therefore, their masses, m a , are expected to be m a Λ. Recently, MeV-to-GeV scale, henceforth QCD-scale, ALPs have received considerable interest [15][16][17][18][19][20][21][22][23][24][25]; however, the phenomenological impact of ALP-gluon interactions is not well understood for QCD-scale ALPs. The effective Lagrangian describing such interactions is where c g is the dimensionless agg vertex coupling constant andG µν ≡ 1 2 µναβ G αβ . In this Letter, we present a novel data-driven method for determining the hadronic interaction strengths of QCD-scale ALPs. Using our method, it is possible to calculate the hadronic production and decay rates of ALPs, along with many of the largest ALP decay branching fractions to exclusive final states. To illustrate the impact on QCD-scale ALP phenomenology of c g = 0, we consider fermions (c f ). We emphasize, however, that our method can easily be generalized to any ALP couplings to SM particles. The impact of ALP couplings to photons, electroweak bosons, leptons, and heavy quarks is known [26], while additional direct couplings to light quarks are easily handled within our framework (see the Supplemental Material to this Letter).
We begin by noting that ALP-lepton couplings arise at the 3-loop order in this scenario, and therefore, are neglected throughout. ALP couplings to quarks are generated by the ALP-gluon interactions. Similarly, ALPphoton interactions are also generated by ALP-gluon interactions, though these are suppressed by O(α 2 EM ). For low masses, ALP-gluon interactions can be studied using chiral perturbation theory (χPT), while for m a Λ QCD perturbative QCD (pQCD) can be employed. However, no reliable calculations are available for most QCD-scale masses. Furthermore, pQCD only predicts the total hadronic decay rate. It does not inform experimenters which decays to look for, or how to determine the sensitivity of any exclusive decays.
Since a → ππ and a → π 0 γ are forbidden by CP and C, respectively, the dominant hadronic decays for lowmass ALPs will be a → 3π 0 and a → π + π − π 0 , even though they violate isospin, along with a → π + π − γ, which is suppressed by a factor of α EM [27]. The decay rates are similar for both 3π modes and to leading order (LO) in χPT are [26] Γ a→3π ≈ πm a m 4 π c 2 g δ 2 for m a 1 GeV, (3) where δ I ≡ (m d − m u )/(m d + m u ) ≈ 1 /3 is the isospin violation induced by m u = m d and K 3π contains the final-state kinematic factors (see Supplemental Material).
In the pQCD regime, the total rate to hadrons is Γ a→gg , which at one-loop order is [28] Γ a→gg ≈ 32πα 2 s c 2 g m 3 a Λ 2 1+ 83α s 4π for m a Λ QCD . (4) arXiv:1811.03474v3 [hep-ph] 8 Nov 2019 For m a ≈ 2 GeV, the one-loop correction is comparable in size to the leading-order result, making this the smallest mass where Eq. (4) has O(1) validity. Naively, it is tempting to interpolate the total hadronic rate from where a → 3π is the dominant hadronic decay to where the pQCD result is valid; however, even though such an interpolation only covers a factor of 4 in m a , numerically Clearly a deeper understanding of the hadronic interactions of QCD-scale ALPs is required -which is our primary focus. By performing a chiral transformation of the lightquark fields [29][30][31], we replace the agg vertex by ALPquark axial-current couplings, which we subsequently match to the chiral Lagrangian. This leads to ALP-π 0 kinetic mixing and ALP-η ( ) kinetic and mass mixing making it possible to assign the ALP a U (3) representation at low masses. We assign all ALPs up to ≈ 3 GeV the where C q are m a -dependent dimensionless constants, f a ≡ −Λ/32π 2 c g is the ALP decay constant, and accounts for α s running which weakens ALP-gluon interactions at higher masses. 2 N.b., we factored out f π /f a to make this dependence explicit, and follow the normalization convention for the pseudoscalar U (3) generators π 0 , η, and η . For m a 1 GeV, we derive the ALP-P mixings, for P = π 0 , η, η , using the LO chiral Lagrangian by extending previous works, e.g. Ref. [26], to three flavors and to higher order in δ I . The full calculations are in the Supplemental Material. Here, we provide simplified expressions to LO in δ I and taking m s m d ≈ 2m u . The ALP-P kinetic and mass mixing cause the P fields to pick up small admixtures of the physical ALP state and vice versa: a ≈ a phy − f π f a P aP P phy . Therefore, the ALP U (3) matrix is a = aπ 0 π 0 + aη η+ aη η for m a 1 GeV, (10) where the ALP-P mixing factors are and N π 0 ,η,η = 1 2 , 1 √ 6 , 1 2 √ 3 are the P normalization factors. At high masses, the U (3) symmetry is expected to be restored; thus the ALP U (3) representation should be The C q values obtained from Eq. (11) are close to unity near 1 GeV; therefore, we interpolate between the lowmass and high-mass regions by setting each C q element to unity once it intersects unity above m η (see Fig. 1). When m a is in the non-perturbative regime of QCD, this U (3)-based representation is the most natural one, and can be used to calculate the production and decay rates of ALPs. Before moving onto such calculations, we stress that for 0.5 m a 2 GeV there are O(1) uncertainties on a. Many LO χPT predictions require O(1) corrections even for η decays (see, e.g., Ref. [32]). Furthermore, while parton-hadron duality is roughly valid above 1 GeV for vector currents [33], not enough is known about η * states to assert that this holds to better than O(1) for ALPs. While the precision of a could be improved, adding direct quark couplings to the ALP model also induces O(1) changes in a. Therefore, a more natural approach is to adopt C u , C d , and C s as effective ALP parameters, with the goal of experimentally exploring all O(1) deviations from the pure ALP-gluon model.
The interactions of pseudoscalar mesons are well described at low energies by the hidden local symmetries framework of vector meson dominance (VMD) [34,35]. Due to ALP-pseudoscalar mixing, which generates the ALP U (3) representation, we can also employ VMD to study ALP interactions. However, since VMD only includes ground-state mesons, the effective theory breaks down once m a m η * ≈ 1.5 GeV. Ref. [33] showed how e + e − → V ( * ) data can be used to predict the hadronic decay rates of any vector particle. While no high-purity source of P ( * ) currents exists, with minimal assumptions we can also use e + e − data to extend VMD-based pseudoscalar predictions up to 3 GeV.
We begin by considering an interaction vertex with two vectors and one pseudoscalar (V V P ). The amplitude for the process V 1 (p 1 ) → V 2 (p 2 )P (q) must be of the form since this is the only valid Lorentz structure. The unknown function F should satisfy where m 2 1 = p 2 1 and m V * 1 denotes the pole mass of the first excited vector meson with the same U (3) representation as V 1 . The pQCD power-counting rule is A ∝ m 4−n 1 , where n is the number of partons involved in the vertex (6 for V V P ) [36]. Since for m 1 m V * 1 F is approximately independent of the ground-state meson masses, we make the ansatz which relies on F being controlled by the heaviest dynamical scale, m 1 here, when all other masses are for ground-state mesons. As shown in Ref. [33], treating e + e − → qq production as the sum of currents with ρlike, ω-like, and φ-like U (3) quantum numbers, rather than the sum of many V * resonances, provides a good description of the data for m ≡ √ s m V * . Therefore, the F function can be extracted from data using where Γ VMD V →f (m) is the width obtained using VMD with F = 1. Figure 2 shows that all available e + e − → V 1 → V 2 P data are consistent with  [39][40][41]. Since we ignore resonance contributions, each result is only valid at masses where narrow resonance contributions are small. We define these as: (ω-like) above where the sizable ωφ interference effect in the 3π final state becomes negligible, (ρ-like) ma mρ * + Γρ * , and (φ-like) ma m φ * + Γ φ * .
where β F = 1.4 GeV is determined from the data. Furthermore, in the Supplemental Material we show that all e + e − → V → P P data [37,38] are also consistent with Eq. (17), modulo the pQCD power-law scaling is m −3 due to the dimensionality of the VMD-based V P P vertex. Since F is simply a smooth monotonic transition from VMD to pQCD, we expect this function to be approximately valid for any 3-meson vertex where only the decaying particle is not a ground-state meson (corrected for vertex dimensionality if needed). We will show below how to use Eq. (17) to extend VMD-based calculations up to 3 GeV, and validate our approach using known η c and η * decay branching fractions. The amplitude for P → V 1 V 2 must have the same Lorentz structure as Eq. (13), and by crossing symmetry must share the same F. Therefore, using the standard VMD framework-but inserting F(m P )-we can calculate Γ a→V V (m a ) up to ≈ 3 GeV. These straightforward calculations follow directly from the standard VMD ones and are provided in the Supplemental Material. Moreover, using the same framework we calculate Γ ηc→V V . Table I shows that our η c → V V predictions are consistent with the experimental values to O(10%). Alternatively, Γ ηc→V V can be calculated using pQCD; however, this approach underestimates the measurements [42] by O(10) even when including higher-twist effects (known as the η c → V V puzzle). That our predictions for Γ ηc→V V achieve O(10%) accuracy provides strong validation of the approach developed here.
Given any ALP U (3) representation and the massdependent vertex scaling function Eq. (17), we can calculate exclusive hadronic ALP decay widths and its total hadronic width. Here we summarize our calculations for the representation shown in Fig. 1, while the details are provided in the Supplemental Material.
• Γ a→V V : As discussed above, we calculate a → ρρ,  [43,44]. Furthermore, we derive more precise experimental values by averaging the PDG ηc → V V results assuming SU (3) symmetry in these decays (the SU (3) column), and find that our predictions are consistent with these SU (3)-averaged experimental results to O(10%).   a → ωω, a → φφ, and a → K * K * using our extended-VMD framework. Schematically, the a → V V and VMDbased η → V V rates are related via Additionally, we calculate Γ a→ππγ as a → ρρ followed by ρ-γ mixing and ρ → ππ.
• Γ a→V P : Since a → ρπ violates isospin and a → K * K violates SU (3) symmetry, these are subleading and difficult to calculate; thus we do not consider them. 3 Most other a → V P decays involving ground-state mesons violate C, so also are not considered.
• Γ a→γγ : The a → γγ decay rate is given by where at low masses C χ γ ≈ 1 is generated by the chiral transformation, while at high masses pQCD quark-loop contributions (at two-loop order) are important [26]. Calculated for the first time here from a → V V → γγ with V -γ mixing, is found to be the dominant contribution over most of the mass range considered. N.b., each contribution is turned on/off for m a values where it is either invalid or where double counting of contributions would occur.
• Γ a→3π : We calculate these rates using the LO chiral Lagrangian, and add a data-derived k-factor to account for final-state-pion rescattering effects. We only consider these decays up to m η , since at higher masses this kfactor is no longer reliable. We consider isospin-violating a-π 0 mixing, and our calculation is the first to consider a-η ( ) mixing followed by η ( ) → 3π. We leave a detailed presentation to the Supplmental Material.
• Γ a→P P P : The amplitudes for a → η ( ) ππ and a → KKπ are dominated by scalar and tensor res- , and a contact term. For a → KKπ we consider a → S Kπ (Kπ)K, where the Kπ S-wave amplitude is taken from Ref. [45], and a → a 0 (KK)π. Schematically, the a → P P P and η → P P P amplitudes are related similarly to Eq. (18), e.g., All scalar resonance amplitudes are taken from the η → ηππ model of Ref. [46], where they were determined by fitting all available data. We use a similar approach to derive the f 2 (1270) tensor-meson contribution in the Supplemental Material. Unlike above, we cannot obtain the F functions for these vertices directly from data. Given that the dimensionality of each of these vertices is the same as that of V V P , we also use Eq. (17) here. This universality assumption is validated by the fact that we accurately predict both B(η c → ηππ) and B(η(1760) → γγ) × B(η(1760) → η ππ) to ≈ 20%, and B(η c → KKπ) to ≈ 10%. Given that a → ηππ or a → KKπ has the largest branching fraction for m a 1 GeV, the lack of more stringent data-driven constraints here is the weakest component of our calculations, though these data-driven tests suggest that the uncertainties are small. (These predictions could be improved with a better experimental understanding of the excited η * states.) • Γ a→gg : The NLO pQCD calculation of Eq. (4) derived in Ref. [26] is adopted here.
• Γ a (total hadronic width): We take Γ a = Γ a→gg for m a 1.84 GeV, while for lower masses, the sum of all exclusive modes is used for Γ a . At m a 1.84 GeV we find Γ a→gg ≈ i=exc.
The decay branching fractions are summarized in Fig. 3. The unaccounted for branching fraction is also shown, and is substantial for m a 2 GeV. This includes decays such as a → AA, i.e. two axial-vector mesons, which should be comparable to a → V V above about 2.5 GeV, and many decay paths that involve excited resonances, rescatterings, etc. For example B(η c → 6π) ≈ 20% so we expect ALP decays to many-body final states to be at about the same rate. We stress that unaccounted for decay modes should only be important for ALP masses where Γ a ≈ Γ a→gg ; therefore, our predictions for the total hadronic width-and the ALP lifetime-should not be affected by unaccounted for decays.
When evaluating the constraints on this model, we focus on the m π < m a < 3 GeV region, where our work has the biggest impact. Constraints where f a 3f π are omitted, e.g., bounds from radiative J/ψ decays, since we assumed f π f a when deriving a. Details on all calculations are provided in the Supplemental Material, while in Fig. 4 and below we summarize the constraints.
• We derive new constraints from φ → γa(ππγ, ηπ 0 π 0 ) and η → π + π − a(π + π − π 0 ). We are not aware of any bump hunts here, and instead assume that the entire known branching fractions to these final states [43] are due to ALPs. Clearly dedicated searches would be much more sensitive.
• We derive new constraints from b → sa penguin decays. At one loop, the agg vertex generates an axial-vector att coupling [26] resulting in enhanced rates for B → K ( * ) a decays [53][54][55][56]. The loop contains a UV-dependent factor [57] schematically given by , which we take to be unity (corresponding to an O(TeV) UV scale). This induces O(1) arbitrariness on the following constraints: The published m ηππ spectrum of Ref. [58] is used to constrain B(B ± → K ± a) × B(a → ηπ + π − ) for m a < 1.5 GeV, excluding the η peak region. The published m K * K spectrum of Ref. [58] is used to constrain B(B ± → K ± a) × B(a → K ± K S π ∓ ) for 0.85 < m Kπ < 0.95 GeV and m a < 1.8 GeV.
The known value of B(B 0 → K 0 φφ) [59] is used to constrain B(B 0 → K 0 a) × B(a → φφ) assuming the entire decay rate is due to ALPs. The known value of B(B ± → K ± ω(3π)) is used to constrain B(B ± → K ± a) × B(a → π + π − π 0 ) for gray constraints depend on UV completion results shown assume UV ≈ log 0.73 < m a < 0.83 GeV, which is the 3π mass window shown in Ref. [60], assuming the entire decay rate is due to ALPs.
Since the ALPs considered here are not massive enough to decay into charm hadrons, the observed inclusive b → c branching fraction [43] is used to place an upper limit on the inclusive b → sa rate of • Similarly, we recast existing limits on ALP-W/Z couplings from Ref. [19] using the s → d penguin decays K ± → π ± γγ [61] and K L → π 0 γγ [62] and the same UVcompletion assumptions.
Over much of the considered mass range the constraints on Λ are below a TeV. We stress that many of these constraints would be much stronger if dedicated searches were performed, e.g., searches for B → K ( * ) a with a → γγ, 3π, ηππ, KKπ, ρρ, etc. would be incredibly powerful probes of QCD-scale ALPs-and could be performed with data already collected by LHCb. In summary, we presented a novel data-driven method for determining the hadronic interaction strengths of ALPs with QCD-scale masses. Our method makes it possible to calculate the hadronic production and decay rates of ALPs, along with many of the largest ALP decay branching fractions to exclusive final states. To illustrate the impact on QCD-scale ALP phenomenology, we considered the scenario where the ALP-gluon coupling is dominant over the ALP coupling to photons, electroweak bosons, and all fermions, but emphasized that our method is easily generalized to any set of ALP couplings to SM particles. We showed that the constraints on this type of ALP are weak, though we also highlighted some promising searches that could provide improved sensitivity to QCD-scale ALPs, e.g. at LHCb. Finally, our work determined the relationship between the ALP lifetime and its gluonic coupling, which is vital for studying the sensitivity of long-lived particle experiments [63]. Here we give more details about the axion-like particle model that is described in the Letter. We start with the following effective Lagrangian: where Λ = −32π 2 c g f a . For m a 4πf π , the Lagrangian of Eq. (S1) can be matched to the Chiral Lagrangian [29][30][31], see Eq. (S17) below. In the Letter, we focus on the specific case where c γ = c q = 0; however, in this section, we keep these terms so that it is clear how to use our framework to obtain results for other ALP models. Generically, both kinetic and mass mixing occur between the ALP and the neutral pseudoscalar mesons P = π 0 , η, and η . Both the kinetic and the mass mixing terms scale as which is the expansion parameter. The mixing can be expressed as where Φ = (a, π 0 , η, η ) and To first order in , the ALP and P masses remain unchanged by the mixing, i.e. m P ≈ M P P . However, after a shift for the canonical kinetic term and the mass terms are diagonalized, the ALP field is redefined as where the function h is defined as This shift only affects ALP-ALP interactions, so we ignore it below. The pseudoscalar fields become P = P phy − δ I P S P P P phy + aP a phy (S7) with which induces ALP-P interactions. Therefore, the ALP can be represented by the U (3) matrix a = aπ 0 π 0 + aη η + aη η , where to leading order in isospin breaking (note that we consider this limit everywhere except for a → 3π) and the U (3) pseudoscalar meson generators are using sin θ ηη ≈ −1/3 and cos θ ηη ≈ 2 √ 2/3. We note that these mixing-angle values, which are inconsistent with more recent high-precision studies (though accurate enough for our purposes), were chosen as they lead to greatly simplified expressions in the following section.

B. ALP couplings to hadrons and its low-mass U (3) representation
In this section, we determine the various mixing factors. Following Refs. [15,26,29], we start with Eq. (S1) and consider only u, d, s quarks and define We now preform the following chiral rotation to the quark fields, which ensures that the agg vertex vanishes: where f a is the ALP decay constant. In order to avoid mass mixing between the ALP and non-singlet U (3) pseudoscalar states, namely π 0 and η 8 , we choose The rotation of Eq. (S13) leads to where N c = 3 is the number of colors. Next, following Ref. [29], we match Eq. (S15) to the Chiral Lagrangian which gives where B 0 = m 2 π 0 /(m u + m d ), f π ≈ 93 MeV, m 2 0 is a hard breaking term due to the anomalous U (1) symmetry which fixes the η − η mixing angle θ ηη , η 0 is the U (1) Goldstone boson before this rotation, and we replacem q (a) by its eigenvaluem = exp [i(a/f a )κ q ] m q exp [i(a/f a )κ q ]. We take the VMD term from Ref. [34] using where the pseudoscalar and vector meson U (3) matrices are The relevant VMD Lagrangian is then [34] with g V V P = 3g 2 /(8π 2 f π ) and g ≈ √ 12π. Due to a-P mixing, the first term in Eq. (S20) induces an aV V vertex, the second term an aV P vertex, while the right-most term is the source of photon-vector-meson mixing.
The model considered in the Letter has c g = 0, and c q = c γ = 0. Considering the Lagrangian of Eq. (S17)-at low masses, where this Lagrangian is valid-we obtain for the mass-mixing terms, where in the last step we take m s m d ≈ 2m u , and for the kinetic-mixing terms. Therefore, the low-mass ALP U (3) representation is given by Eq. (S9) with which gives the following values for the C q terms: We note that at the limit of m a m η these results give C s → 0; however, the above equations are only valid for m a 1 GeV.

II. ALP DECAYS
The decay rates and branching fractions are summarized in Fig. S1. In this section, we provide the detailed calculations used to obtain these results.   1.84 GeV, we take the total width to be the sum of the exclusive decay widths, whereas for ma 1.84 GeV we take the total width to be Γa→gg.

A. a → γγ
Even though the ALP does not couple directly to the electromagnetic field when c γ = 0, as shown in Eq. (S16) the chiral transformation generates a coupling at low masses. In addition, ALP-pseudocalar mixing-followed by P → γγ-will also contribute. Finally, at high masses and at the two-loop order, pQCD contributions from quarks become important. The total decay rate for a → γγ is given by The contribution from the chiral transformation is We turn this contribution off above the η mass, since the chiral rotation is no longer valid (see discussion in the main text on the U (3) representation). We calculate the VMD-based contribution as a → V V ( ) → γγ, where the vector mesons mix with the photons, which predicts the pseudoscalar P → γγ rates to O(10%) accuracy. This contribution is given by where the phenomenological suppression of the VMD amplitude at higher masses-obtained in the Letter using e + e − data-is contained in the function F(m a ). As we will show below, the pQCD-based contribution from light quarks surpasses the VMD-based one at m a ≈ 2.1 GeV. This is expected since, due to the suppression of the V V P vertex at higher masses, contributions involving quark loops become dominant in the perturbative regime; therefore, we transition from the VMD-based light-quark contribution to the pQCD-based one at the point where the pQCD contribution is larger. The full pQCD-based result has contributions from both light and heavy quarks [26] C pQCD,uds These expressions are simplifications of those in Ref. [26], and even though they are accurate to O(10%) in the mass range that we use them, our numerical results are obtained using the full expressions. Figure S2 shows the various contributions to Γ a→γγ compared to those from Ref. [26]. As expected, our result agrees with that of Ref. [26] for m a 0.2 GeV and for m a 2.1 GeV, but is significantly different between these two mass regions. This occurs because we include mixing with the η and η mesons, and the VMD-based a → V V → γγ contribution. Finally, one utility of the framework we are using is that one can immediately see that replacing the ALP by the pion, which includes neglecting the direct coupling to photons induced by the chiral transformation, gives the expected result: The corresponding cross checks where the ALP is replaced by the η ( ) also produce the well-known expected results.
The decays a → 3π 0 and a → π + π − π 0 proceed via the isospin-violating a-π mixing, and by a-η ( ) mixing followed by η ( ) → 3π. Given that these decays are explicitly isospin violating, we only calculate their rates up to m a = m η due to the large uncertainty in the isospin-violating component of the ALP U (3) representation at higher masses. Already by m η , these decays have small branching fractions.
Starting from the LO chiral Lagrangian, we find that a → 3π has contributions from the 4π and 2η (0,8) 2π vertices. The former involves a-π mixing, while the latter involves both a-η (0,8) and η (0,8) -π mixing. The resulting amplitudes are The decay rates are then where S = 1 for π + π − π 0 and S = 3! for 3π 0 are the usual symmetry factors. The k factor is added to account for the fact that the LO χPT predictions for Γ η ( ) →3π are a factor of ≈ 3 lower than the corresponding experimental values.
The NNLO χPT result is much larger than the LO calculation, largely due to final-state interactions between the pions [32]. We use k = 2.7 here, which is the mean of the k-factor values needed to obtain the known values of the η and η decay widths, i.e. we obtain the k-factor by comparing to experimental data on η ( ) → 3π decays. Given that the same k-factor works at m η and m η to ≈ 20% accuracy, we expect that this factor is reliable for m a m η ; however, we have no reason to expect that this same k-factor works for higher masses, providing another motivation (beyond the large uncertainty on the isospin-violating component of the ALP U (3) representation at higher masses discussed above) for only considering this decay below m η . As above, we can again cross check our results by replacing the ALP with the low-mass pseudoscalars. For example, using the same formalism we can derive the LO amplitude of η 8 → 3π and which are the well-known LO χPT results for η → 3π when k = 1. As another check, considering only a-π mixing gives the following: where which agrees with Ref. [26] for k = 1; i.e. our result agrees with that of Ref. [26], except for our inclusion of ALP-η ( ) mixing and the k factor.

C. a → V V
We calculate the decay rate for a → ρρ → π a π b π c π d using VMD, including the phenomenological suppression factor obtained from e + e − data. The amplitude is obtained from the aρρ vertex and is given by We use a mass-dependent width for the ρ meson in the Breit-Wigner functions (BW) following Ref. [37]. The a → 4π decay rates are then given by where S = 2 for π + π − π 0 π 0 and 4 for 2(π + π − ) are the usual symmetry factors. We can cross check this result by replacing the ALP with an η meson: a → η , m a → m η , f a → f π then Γ a→4π → 58 eV ≈ Γ η →4π = 52 ± 13 ev (S41) In the above cross check, we have summed the contributions from a → π + π − π 0 π 0 and a → 2(π + π − ). The experimental value is taken from Ref. [43]. The decays a → φφ → 4K and a → K * K * → 2K2π are calculated in an identical way, but using the appropriate resonance parameters and symmetry factors. Since the ω decays predominantly to 6π, the Lorentz structure of the amplitude is more complicated. Given that the ω is narrow, we instead calculate the decay rate of a → ωω using the narrow-width approximation and find We do not consider the decay a → φω or any isospin-violating V V decays.
We can now validate our data-driven approach by comparing the ALP branching fractions to the measured values of the corresponding η c branching fractions: m a → m ηc , Table I shows agreement for all η c → V V decays to O(10%) (S43) The value of f a cancels in the branching fraction calculation, so it does not need to be specified in this comparison. N.b., the PDG does not quote a value for B(η c → ωω) because no experiment has yet observed greater than 3σ evidence for this decay. The value in Table I is the ≈ 2σ result from Ref. [44]. For the other three decays, the values in Table I are the PDG average values, i.e. the PDG averages of the experimental measurements. In the main η c section of the PDG, the PDG instead quotes their fit values, which are the result of a constrained fit to a large number of η c decay observables. We also note that since the η c can mix with η ( ) -and can decay electromagnetically-the ALP decay rates do not necessarily need to exactly match those of the η c meson, though we expect those effects to be small. Finally, we note that mixing with the η c should also be considered for ALPs at this mass. We leave this for future work. The decay a → ππγ is calculated using a → ρρ followed by ρ-γ mixing within the VMD framework. The result follows closely from those above and is given by This result is cross checked by comparing to the corresponding η decay: Therefore, we also find the expected result for this decay, which is important for m a 1 GeV.

D. a → V P
Decays of the form a → V P proceed via the V P P vertex. As was done with a → V V in the Letter, the a → V P decay amplitude can be related to that of the V → P 1 P 2 process via crossing symmetry, and this process can be studied using e + e − → V → P 1 P 2 data. The amplitude for V (p V ) → P 1 (p 1 )P 2 (p 2 ) is which is of a different dimension than the V V P vertex for which F(m) was derived (this amplitude is one order lower in mass dimension). Figure S3 shows that e + e − → V → P 1 P 2 data is consistent with using F V P P = F, except with the pQCD scaling reduced by one order of mass dimension, i.e. the [β F /m] 4 term at high mass in Eq. (17) is replaced by [β F /m] 3 (the same constant β F is used for both functions). The decay rates Γ a→V P can be calculated using VMD, along with the phenomenological suppression factor F V P P (m) derived above. Since a → ρπ violates isospin, we do not consider this mode. The decay a → K * K violates U (3) symmetry for the case considered in the Letter, where C u ≈ C d ≈ C s ; therefore, we will not explicitly calculate this decay rate here, though it is straightforward to do so given the equations in this subsection. The absence of evidence for a resonance in η c → K * (→ Kπ)K supports our choice to neglect this channel. Given that all other modes violate C, we do not consider any modes of this type.

E. a → P P P
The amplitude for a → ηππ includes contributions from a direct (mixing) term, corresponding to vertices such as η 0 η 8 ππ in the chiral Lagrangian, and from scalar and tensor resonances: FIG. S3: Same as Fig. 2 but for e + e − → V → P P data [37,38]. The F function is the same as for the V P final state, except for the pQCD scaling power due to the different dimensionality of the V P P amplitude c.f. that of V V P .
The mixing term is given by where F P P P P is the unknown F function for the 4-pseudoscalar vertex. We include the mixing term, though its contribution is small for all masses, even for masses as low as m η where the resonance contributions are all suppressed by their Breit-Wigner terms.
All of the necessary resonance parameters and couplings for the σ, f 0 , and a 0 are taken from the η → ηππ model of Ref. [46], where the various resonance coupling constants were fit to all available data. Accounting for ALPpseudoscalar mixing gives the following terms for the π 0 π 0 final state (the value of the amplitude for π + π − is the same, though it involves different U (3) generator expressions): A(a → f 0 (ππ)η) = 7.3 GeV A(a → a 0 (ηπ)π) = 13 GeV where the scalar-meson U (3) representations were also fit to data and are approximately We turn off the σ contribution at the KK threshold, since using a simple Breit-Wigner for this term above 2m K violates unitarity. An improved model could employ a coupled-channel K-matrix approach, though we do not consider this here. We derive the f 2 amplitude and fix its couplings from its decay width to ππ and obtain where p ππ ≡ p π1 + p π2 , q ππ ≡ p π1 − p π2 , and for simplicity we take f 2 = diag{1, 1, 0}/2, which is known to be a good approximation [43]. The expressions for a → η ππ are similar, though with η replaced by η and the a 0 → η π coupling coupling constant is 20% larger than that of a 0 → ηπ. Unlike for the V V P and V P P vertices, we cannot derive the F P P P P , F SP P and F T P P functions from e + e − data. At low masses, these F functions are normalized to be unity just like the V V P and V P P ones. Furthermore, the pQCD power counting at high masses is the same for all of these amplitudes as it is for V V P , assuming that the tetraquark content of all resonances is small. Therefore, we take F SP P (m) = F P P P P (m) = F T P P (m) = F(m) .
This assumption is shown below to have better than O(1) accuracy, though we note here that an improved understanding of the excited η * states would enable deriving better data-driven constraints on F SP P , F T P P , and F P P P P . Using the amplitudes defined above, the rates are obtained as where S = 2 for η ( ) π 0 π 0 and 1 for η ( ) π + π − are the usual symmetry factors. First, we cross check our result for the a → ηππ decay by replacing the ALP with an η , which gives (summing the ηπ + π − and ηπ 0 π 0 modes) a → η , m a → m η , f a → f π then Γ a→ηππ → 116 keV ≈ Γ η →ηππ = 128 ± 2 keV (S56) Of course, since the model of Ref. [46] was fit to data-including this η decay-the numerical value should be similar if implemented correctly. A more interesting cross check involves comparing the ALP branching fractions to the corresponding known η c values for m a = m ηc . We first perform this check for η c → ηππ: Performing the same check for a → η π + π − gives a prediction of 0.3%, whereas the experimental value from Belle is 1.3 ± 0.2% [64]. However, Belle attributes ≈ 64% of η c → η ππ to a 2 GeV scalar resonance that is not included in our model. (After submitting this Letter to arxiv, Belle updated their paper to remove the claim about the 2 GeV scalar resonance. However, it is clear that much of this decay involves high-mass dipions whose source is not included in our model.) Our prediction is consistent with the remaining branching fraction of ≈ 0.5 ± 0.1%. Furthermore, our predictions for η c → η f 0 (980) and η c → η f 2 (1270) are both consistent with the published dipion mass spectrum in Ref. [64]. So, while we underestimate the η ππ branching fraction for m a values close to 3 GeV, for lower masses our prediction for this final state should be within a factor of two (we do not see any need to improve the prediction for this final state). Therefore, we conclude that our predictions for a → η ( ) ππ are consistent with η c data, which validates our F SP P , F P P P P , and F T P P functions with better than O(1) accuracy. Finally, there is one additional cross check that can be performed using the η(1760) state: m a → m η(1760) , then B(a → γγ) × B(a → η π + π − ) → 1.1 · 10 −7 ≈ B(η(1760) → γγ) × B(η(1760) → η π + π − ) = (1.2 ± 0.3) · 10 −7 (S58) Here, we have assumed that the U (3) representation of the η(1760) is the same as that of the ALP. Lack of knowledge of the nature of this η(1760) state induces an O(1) uncertainty here. Taken together, we conclude that the available cross checks suggest that our F SP P , F P P P P , and F T P P functions are accurate with at most O(1) uncertainty over the full mass range considered in this study. The rate for the family of decays a → KKπ is calculated using a similar approach to the one used above for a → ηππ. We take the total amplitude to be where S Kπ denotes the Kπ S-wave amplitude. It is well-known that S Kπ has a large K * 0 (1430) contribution, and that it is not well described by a simple sum of Breit-Wigner terms. We use the empirical S Kπ amplitude measured in Ref. [45] by BaBar. In principle, there is a mixing term similar to the a → ηππ case; however, it is negligible for all m a and so we ignore it.
Here we provide the full expressions for the amplitudes for the a → K + K − π 0 , though as above the value of the amplitude is the same for all 6 KKπ final states (but involves different U (3) generator expressions): A(a → a 0 (K + K − )π 0 ) = 13 GeV 2 aa 0 π 0 (p a · p π )(p K + · p K − )BW a0 (m KK )F SP P (m a ) , A(a → S Kπ (Kπ)K) = 8.2 GeV 2 a{K + , K − } F SP P (m a ) (S61) × [(p a · p K − )(p π · p K + )S Kπ (m K + π ) + (p a · p K + )(p π · p K − )S Kπ (m K − π )] , where again the coupling parameters are taken from Ref. [46] (the κ couplings are used for S Kπ ), and as stated above, the shape and phase of S Kπ is taken from Ref. [45]. The total rate for a → KKπ is 6 times that obtained for a → K + K − π 0 due to equivalent contributions also from K ± K S π ∓ , K ± K L π ∓ , and K S K L π 0 . We cross check our a → KKπ result by comparing to the corresponding known η c values for m a = m ηc : m a → m ηc , then B(a → KKπ) → 7.8% ≈ B PDG (η c → KKπ) = 7.3 ± 0.5% (S62) Our calculation is consistent with the measured value. N.b., while the shape and phase of S Kπ were taken from a fit to data [45], the magnitude is set by the κ resonance parameters in Ref. [46] and by our F function.

III. ALP CONSTRAINTS
Here we provide details on the constraints placed on the ALP scenario discussed in the Letter, i.e. the case where only c g = 0 . We focus on the mass region of m π < m a < 3 GeV, but note that for m a < 100 MeV the strongest constraint is from BR(K + → π + +invisible) < 7.3×10 −11 [21], where f a 3 TeV [22,23]. Constraints where f a 3f π are omitted, e.g., bounds from radiative J/ψ decays, since we assumed f π f a when deriving the ALP-pseudoscalar mixing factors.

A. LEP & Beam Dumps
Limits have been placed on the aγγ vertex from LEP [20,47] and beam-dump experiments [48,49]. We use the a → γγ calculation above to relate the aγγ interaction strength to f a . The beam-dump limits only constrain ALP masses where B(a → γγ) = 1, even for the scenario considered here; therefore, the relationship between the ALP lifetime and the strength of the aγγ vertex is the same here as it is when the ALP only interacts with the electromagnetic field. For the LEP constraints, we also include our calculation of B(a → γγ) when recasting the published limits for this model.

B. φ → aγ
Constraints can be placed on f a considering the decays φ → aγ, with a → ππγ and a → ηπ 0 π 0 , using experimental upper limits for the φ → ππγγ and φ → ηπ 0 π 0 γ decays [43]: where in each case we conservatively assume that the ALP decay is the only contribution to each final state. We then use the ratio of φ decay rates the known value B(φ → ηγ) = 1.3% [43], and the ALP branching fractions shown in Fig. 3 to determine the constraints on f a .
We calculate the rate for η → π + π − a using our a → ηππ model, but with the parent particle properties replaced by those of the η and the final-state η replaced by the ALP: A(η → ππa) = A(a → ηππ){m a → m η , m η → m a , aη → 1, f a → f π } × aη (S66) i.e., we take the amplitude for a → ηππ but for m a = m η , f a = f π , and a = η , along with also m η = m a , then multiply the result by aη which accounts for the ALP-η mixing. We then normalize this using the known value of B(η → ηππ) [43]. Using our calculation of B(a → π + π − π 0 ) we are able to determine the constraints on f a .