Topological susceptibility in QCD with two flavors and 3-5 colors -- a pilot study

I present a calculation of the topological susceptibility $\chi_T$ in $SU(N_c)$ gauge theory with $N_c=3-5$ colors and $N_f=2$ degenerate flavors of fermions. The results lie on a common curve when expressed in terms of the combination $N_c m_{PS}^2 t_0$ where $m_{PS}$ is the pseudoscalar meson mass and $t_0$ is the flow parameter. $\chi_T$ approaches its quenched value as the pseudoscalar mass becomes large. The lattice simulations use clover fermions. They are done at a single lattice spacing, roughly matched across $N_c$, and over a restricted range of fermion masses.


I. INTRODUCTION, MOTIVATION, AND RESULTS
Real world QCD, with N c = 3 colors, shares many features with its N c → ∞ limit. Large N c expectations [1,2] mostly arise from graph counting, in that only planar diagrams survive in the large N c limit. The consequences of these expectations are often applied to nonperturbative observables, like masses or matrix elements, and these applications are qualitatively satisfied by experimental data.
Nonperturbative predictions really need nonperturbative checks, and there is a small lattice literature of simulations of QCD with N c > 3. (See [3][4][5][6][7] for a selection of reviews and original work.) Simulation results generally agree with expectations. This note is another check of large N c counting. It is a calculation of the topological susceptibility with two flavors (N f = 2) of degenerate mass fundamental representation fermions and N c = 3, 4, and 5 colors.
There are at least three approaches in the literature for studying large N c QCD with lattice methods. The largest N c values are attained by assuming volume independence and simulating on small spatial volumes (see [7] for a recent review). Fairly large N c values (up to N c = 17) have been reached doing quenched simulations [5] on large volumes. Simulations using full QCD, with dynamical fermions, in large volumes are much more costly. However, part of the large N c phenomenology is that fermion effects die away at large N c . To see their effects die away, it is necessary to include dynamical fermions from the start.
Naive large N c counting does not address possible effects depending on the fermion mass m q . For many processes, N c and m q effects approximately factorize: Q(N c , m q ) ∼ N p c f (m q ). Examples include meson masses (m H vs m q ), decay constants, and even baryon masses (M B (N c , m q , J) = N c m 0 (m q ) + (J(J + 1)/N c )B(m q ) + . . . for angular momentum J). The qualitative agreement of full QCD (N c = 3 with dynamical fermions) and quenched QCD (replacing a dynamical fermion by a quenched valence one) is a consequence of this factorization.
But there are (at least) two cases where this factorization should not occur. These cases occur for chiral observables and follow from the scaling of the pseudoscalar decay constant f P S and condensate Σ: f P S ∝ N 1/2 c and Σ ∝ N c . (The behavior of f P S is directly tested on the lattice; the second relation is only known indirectly: the squared pseudoscalar mass divided by the fermion mass m 2 P S /m q is seen to be independent of N c and this ratio is also proportional to Σ/f 2 P S .) The first case involves quantities scaling as m 2 P S /f 2 P S ∝ m 2 P S /N c or m q /N c . Examples include higher order corrections to chiral observables, O = O 0 (1 + C(m 2 P S /f 2 P S ) log(m 2 P S /Λ 2 ) + . . . ). These are typically hard to see in simulations because they are sub-leading corrections. One example, though, has been reported in Ref. [8], the dependence of the gradient flow scale t 0 on m 2 P S /N c as described by Golterman and Shamir [9].
The second case is the subject of this note: the topological susceptibility χ T . It has very different behavior in the quenched limit and at small fermion mass. In the former case χ T is a constant (call it χ Q ), which is nearly independent of N c . In the latter case χ T ∝ m q Σ or m 2 P S f 2 P S , so one ought to see scaling as χ T ∝ N c m 2 P S at small N c m 2 P S . In fact, there is an old prediction of a functional form for all mass values, due to Di Vecchia and Veneziano [10] and Leutwyler and Smilga [11], (2) (The small mass limit of this formula was also derived by Crewther [12].) With 2m q Σ = f 2 P S m 2 P S (appropriate to the f P S = 93 MeV convention), Eq. 2 becomes and with f P S (N c ) = √ N c f 0 the expected scaling behavior of the pseudoscalar decay constant across N c , we can write 1 That is, the inverse topological susceptibility rises linearly from its quenched value with respect to the scaling variable 1/(N c m 2 P S ) or 1/(N c m q ). The purpose of this paper is to take a first look at χ T (m q , N c ) -as the title says, "a pilot study." This means • One lattice spacing (loosely speaking), roughly matched across N c using a gluonic observable (alternatively, roughly matched in bare 't Hooft coupling λ = g 2 N c ) • One simulation volume, a range of intermediate mass fermions, and moderate statistics, so all observations are still tentative The goal of the paper is to answer a set of physics questions and a set of simulation questions. The physics questions are 1. Does χ T (m q , N c ) actually scale as χ T (m q N c ) (equivalently χ T (m 2 P S N c )), smoothly connected to χ Q at large m 2 P S N c ? 2. Does χ T (m q , N c ) follow the Di Vecchia, Veneziano, Leutwyler, Smilga functional form?
The answers are (1) yes, apparently and (2) qualitatively, but not quantitatively, at the lattice spacings studied.
The main simulation question is: it is well known that in ordinary N c = 3 QCD χ T has a very long simulation autocorrelation time τ . How severe an issue is this across N c ? The answer is: τ grows with N c . N c = 3 or 4 seem to be manageable with the naive approach I took to study the problem, but N c = 5 already shows clear issues.
The result of the simulations described here is displayed in Figs. 1 and 2 (the overall scale is set by the "flow parameter" t 0 ). Monte Carlo results collapse to a common curve, which is a straight line in Fig. 1. In that figure we see that the line extrapolates to the quenched topological susceptibility measured by Ref. [25]. Evidently, the effects of dynamical fermions for this observable do not depend separately on N c and the fermion mass, but on the combination N c m q or N c m 2 P S . In 2001 Dürr [13] presented a similar plot, and comparison to Eq. 4, with N c = 3 data.  As a contrast, Fig. 3 shows the flow scale t 0 versus (am P S ) 2 /N c , the other non-factorizing mass and N c dependence. I am just showing it in passing, since it has been discussed before.
The outline of the paper is as follows: Section II describes the calculations: it covers data sets, simulation methodology, and has a discussion of gradient flow based observables. Here is where I describe how I dealt with the long autocorrelations in the data. Section III presents results: first, comparisons of my data to high statistics calculations of the quenched topological susceptibility, then a comparison to previous calculations of the N f = 2 SU(3) susceptibility. These are checks to make sure that the present calculation seems to be in order. I then discuss my results for the N f = 2 susceptibility across N c . Section IV is a brief summary. A reminder of the derivation of Eq. 2 is given in an appendix.

II. TECHNICAL ASPECTS OF THE CALCULATION
A. Simulation methodology, lattice actions, data sets The dynamical fermion simulations contained two degenerate flavors of Wilson-clover fermions. The gauge action is the usual plaquette action, with the bare gauge coupling g 0 parameterized by β = 2N c /g 2 0 . The fermion action uses gauge connections defined as normalized hypercubic (nHYP) smeared links [14][15][16] (with the arbitrary N c implementation of Ref. [6]). The bare quark mass m q 0 is introduced via the hopping parameter κ = (2m q 0 a + κ am q (a m P S ) 2 t 0 /a 2 N SU ( The clover coefficient is fixed to its tree level value, c SW = 1. The updating scheme is the Hybrid Monte Carlo (HMC) algorithm [17][18][19] with a multi-level Omelyan integrator [20] and multiple integration time steps [21] with one level of mass preconditioning for the fermions [22]. All lattice volumes are 16 3 × 32 sites. The gauge fields experience periodic boundary conditions; the fermions are periodic in space and antiperiodic in time.
All data sets are 5000 to 6000 trajectories in length. Lattices used for analysis are spaced a minimum of 10 HMC time units apart, so individual bare parameter sets contain 490-600 stored lattices. All data sets (individual (β, κ) values) are based on a single stream.
The data sets were collected at approximately equal values of lattice spacing. (The bare gauge coupling is fixed at each N c and only κ is varied.) This precludes a discussion of lattice artifacts. However, comparisons across N c , or with large N c phenomenology, can be done at any value of the lattice spacing.
The data sets are extensions of ones presented in Refs. [6,8] and full spectroscopy is presented in the first of these references. Table I summarizes relevant information for the runs. Across the data sets, m 2 P S the squared pseudoscalar meson mass is roughly linear in the Axial Ward Identity fermion mass m q . The ratio (m P S /m V ) 2 where m V is the vector meson mass, spans the range 0.16-0.64.

B. Gradient flow for length scale
The lattice spacing and the topological charge are measured using the technique of gradient flow [23,24], a smoothing method for gauge fields via diffusion in a fictitious (fifth dimensional) time t. In continuum language, a smooth gauge field B t,µ is constructed through an iterative process beginning with the original one, A squared length t 0 is defined through the field strength tensor built using smoothed degrees of freedom, G t,µν , using the observable It is set by fixing the quantity The choice of C(N c ) across N c is somewhat arbitrary, as it is for N c = 3. There is a natural set of choices motivated by the perturbative expansion where α(q) is the strong coupling constant at momentum scale q ∝ 1/ √ t and λ(q) is the corresponding 't Hooft coupling. The large N c limit, where matching gluonic observables is achieved by matching the bare 't Hooft couplings, is t 2 E ∝ N c . Beyond that, there are many possible choices. In Ref. [8], I tested the leading C(N c ) ∝ N c behavior by matching t 0 to another gluonic observable, an inflection point on the static potential called r 1 . This choice amounts to fixing the inflection point across N c . Most other people adopt a different convention, taking C(3) = 0.3 as the usual value used in SU(3). This amounts to saying that the ratio t 0 /r 1 has a 1/N c variation away from its N c = 3 value (or, the large N c limit is different from the N c = 3 ratio), nothing more. I will follow this choice, rather than the one of Ref. [8], because I want to match the quenched results of Ref. [25], and they use the convention of Eq. 10. The extraction of t 0 from lattice data is standard and is described in Ref. [8]. The gradient flow differential equation is integrated numerically as described by Lüscher [24]. Calculations used the usual "clover" definition of E(t). An autocorrelation analysis will described after the next subsection.

C. Gradient flow for topological charge -definitions
The topological charge density is defined as and is computed using gauge fields at flow time t as In a system with periodic boundary conditions, the topological susceptibility is simply This point actually needs a bit more discussion. Eq. 13 implicitly assumes that Q(t) = 0 when averaged over the measurements taken in the simulation. The observation of Q(t) = 0 is an artifact, indicating that the data has long time autocorrelations. To see if this affects my results, I will compare Q 2 to the full correlator A second issue is that, at any nonzero lattice spacing, χ T (t) depends on t. Taking the continuum limit involves measuring t 2 0 χ(t) at several lattice spacings and taking the a → 0 limit (this could be done by plotting the data versus 1/t or, re-inserting the lattice spacing a, plotting versus a 2 /t 0 ). In principle, this could be done for any t. The data in this study are all at one lattice spacing, so one has to ask, are the physics hints given by a study at one value of a sensitive to the choice of operator (choice of t for Q(t)).

D. Data analysis
Both t 0 and the topological charge show simulation time autocorrelations. I attempted to estimate the autocorrelation time through the autocorrelation function defined as (for a generic observable A) where The integrated autocorrelation time, up to a window size W , is Unless the total length in time of the data set is much larger than the autocorrelation time, it is difficult to estimate an error for τ int . I analyzed my data sets by breaking them into multiple parts, each part being order 1000 trajectories or 100 saved lattices, computing τ on each part, and taking an error from the part-to-part fluctuations. I repeat the calculation of autocorrelation times for Q(t). In contrast to the results for t 2 E(t), in general τ int (W ) is an irregular function of W . This is already an indicator of long correlations in the data. Results for the same parameter values as in Fig. 4 are shown in Fig. 6.
Fit results come from a jackknife analysis, removing sets of lattices whose length is longer than the estimated integrated autocorrelation time. This would be n J successive lattices for τ int = 10n J molecular dynamics time units. To be explicit: for a given jackknife I compute the averages Q(t) , Q 2 (t) and C(t) = Q 2 (t) − Q(t) 2 ; the uncertainty of each comes from a jackknife. I varied the size of the jackknife beyond the estimate of the integrated autocorrelation time. I estimate the fractional error from loss of statistics as where n = N/n J . That gives a rough error bar. The uncertainty in C(t) increases with n J and then either saturates, or at least the growth becomes smaller than what statistics allows one to see. Results for t = 3 are shown are shown in the others, but Fig. 7 instructs us to take n J = 10 for the three smallest κ values and 20 for the others. Now for fits to the data. I observe, generally, that at small t, Q(t) has a Gaussian distribution. At large t, individual configurations "cool," that is, Q peaks at equally spaced, roughly integer values. This appears to happen at smaller t for N c = 5 than it does for N c = 3. I test that the data is Gaussian using the Kolmogorov-Smirnov test [26]. It compares the integrated distributions (the cumulants) of the measured data C(x) and the theoretical prediction P (x). The cumulant of the measured data is C(x) = n(x)/N where n(x) is the number of data points with a value smaller than x and N the total number of data points. The theoretical prediction for this quantity is found by integrating the distribution: P (x) = x −∞ f (y)dy. The quantity of interest is the largest deviation of P and C: D = max x |P (x) − C(x)|. From this the confidence level is given by where (Note larger Q KS is better.) So far, I have not specified a flow time for C(t), so I did fits for a range of t values. Results for C(t) and Q(t) are shown in Figs. 8 and 9. Plots of Q 2 are almost identical to those of C(t). Q(t) , in contrast, is nearly independent of t. Q(t) should average to zero. The figures, and the data for t = 3 presented in Table II, show several cases where Q sit two standard deviations away from zero. However, even for the most extreme deviations (SU(5), κ = 0.1265 and 0.1265) the difference between C(t) and Q 2 (t) is less than the RMS value of the uncertainties of the two determinations.
In all cases, the data is (nearly) Gaussian about its mean. This is checked through a cumulant analysis where the expectation is (the integral of) a Gaussian with Q and  Table II. The x axes are slightly displaced for viewing. Table II. Deviations from Gaussianity occur at long flow time because Q has cooled to approximate integers, as revealed by steps in the cumulant. It didn't seem to be worthwhile to guess a more complicated (Gaussian with steps) distribution for comparison.
A few pictures of cumulants shown in Fig. 10 illustrate the fits. Fig. 8 shows that once t becomes greater than about 2.5, the value of C(t) (and Q(t) 2 , which is almost identical) roughly forms a plateau. In a better study, I would fix t to any convenient value and extrapolate Q(t) 2 to a = 0. For the remainder of this study I will just fix t to t = 3. Results are summarized in Table II. All phenomenology in the next section will be done with χ T = Q(t = 3) 2 /V .

III. RESULTS
With data sets at one bare gauge coupling per N c it is hard to quantify lattice artifacts. I can compare my results to other simulations and ask if they look reasonable. There are two places where this is done.     Fig. 11.

A. Comparison with high precision quenched results
The first one is the quenched limit. The authors of Refs. [27] and [25] published high statistics data for t 2 0 χ T for N c = 2 − 6. I collected a data set much smaller than theirs but comparable to my dynamical sets in size and in lattice spacing, to check against theirs. It is recorded in Table III. My sets are 500 measurements per N c , each spaced 100 sets of sweeps through the lattice, each sweep consisting of a mix of four Brown -Woch microcanonical over-relaxation steps [28] and a Cabibbo -Marinari heat bath update [29], performed on all N c (N c − 1)/2 SU(2) subgroups of the SU(N c ) link variables. Fig. 11 shows the comparison. Within my large errors, my results are compatible with the high statistics results of Refs. [25,27]. The next comparison is with high precision N c = 3, N f = 2 results. I have only been able to find a few recent calculations (most recent studies are for N f > 2 with physical strange (and beyond) fermion masses). But there are three useful sets.
The first is that of Ref. [30]. I used essentially their techniques: the topological susceptibility is measured from flow. Ref. [30] presented data from three small lattice spacings, a = 0.075 fm, 0.065 fm and 0.048 fm (speaking nominally; flow parameters, and hence the lattice spacing a are computed at each value of bare fermion mass) on very large lattices. The authors of Ref. [30] provided me with tables of t 2 0 χ T versus t 0 m 2 P S . Most of their data is at smaller pseudoscalar mass than mine.
The other two calculations measure the topological charge defined using fermionic zero modes. Ref. [31] is a calculation using overlap fermions in a sector of fixed topology. The lattice spacing is about 0.12 fm. They publish a table of χ T r 4 0 versus m P S r 0 , where r 0 is the Sommer parameter. [32], an inflection point on the heavy quark potential. I take their value r 0 = 0.49 fm and the value of t 0 quoted in the review by Sommer, Ref. [33], √ t 0 = 0.154 fm (from Refs. [34,35]) to rescale the data. Ref. [36] is a similar calculation with domain wall fermions where the topological charge is determined using valence overlap fermions. Taking pseudoscalar masses from their Ref. [37], I rescale their numbers (quoted in GeV units but determined from r 0 ). Their data is also shown in Fig. 12.
The line in the figure is t 2 0 χ T = (t 0 f 2 P S /4)m 2 P S with √ t 0 = 0.154 fm and f P S = 93 MeV.
I show my own data for χ T (t) for two choices of t, 3 and 1.2. What points am I trying to make with this busy figure? To begin, at the lattice spacings of these data sets, lattice artifacts are large and are rather different for the two simulations based on flow and the ones based on zero modes. The susceptibility measured by flow is expected to have a lattice artifact A of the form where A scales as a 2 . This is what the authors of Ref. [30] saw. This has been checked in the chiral limit by Münster and Wulkenhaar [38]. In contrast, zero modes should drive χ T to zero as the fermion mass vanishes. My own fits to the data of Refs. [31] and [36] have  21) to individual N c values and to various combinations of N c . I use my quenched data as inputs to fix the intercept (at 1/(t 0 (N c /3)m 2 P S ) = 0). Fit results and the chi-squared per degree of freedom are shown in Table IV. The N c = 3 and 4 data sets are clearly consistent, and the N c = 5 topological susceptibility falls on the same curve, although the uncertainty in the slope C is clearly much greater. Fig. 13 replaces the straight-line presentation with a conventional one of t 2 0 χ T versus t 0 m 2 P S N c /3. There are four lines: Line (1) is just linear dependence with the slope from the fit to Eq. 21. Line (2) is linear dependence (C = 4/(t 0 f 2 P S )) with physical (SU (3)) values for t 0 and f P S . Line( 3) is the entire fit function of Eq. 21. Line (4) is the fit function but with physical C. Panel (b) blows up the small mass region of panel (a). My results are black squares for χ T (t = 3) and black octagons for χ T (t = 1.2), while the blue points are data from [30]: fancy crosses, squares, and crosses are data at lattice spacing a = 0.075 fm, 0.065 fm and 0.048 fm, respectively. Red fancy diamonds are from Ref. [36]. Purple bursts are data from Ref. [31]. The line is t 2 0 χ T = (t 0 f 2 P S /4)m 2 P S with √ t 0 = 0.154 fm and f P S = 93 MeV.   Finally, Fig. 14 shows a third view of curve collapse, t 0 χ T /(m 2 P S N c /3) versus t 0 m 2 P S N c /3. This one is a bit dangerous, since χ T from flow does not extrapolate to zero at zero fermion mass: the parameterization blows up there. Overlaid on the data is the expectation of Eq. 4 with physical (SU(3)) values for t 0 and f P S , and t 2 0 χ Q taken to be a a nominal 6.25 × 10 −4 . What conclusions can be drawn from these figures? First, it's clear that χ T is, broadly speaking, a function of the combination m 2 P S N c . Second, it's also clear that Eq. 4 with physical (SU(3)) values for t 0 , f P S and t 2 0 χ Q taken from high precision lattice data does not reproduce the data. At this point there are two obvious things to say.
First, this difference could just be due to discretization artifacts at the lattice spacing where the simulations were carried out. A real check requires several lattice spacings and an extrapolation.
Second, the formula Eq. 4 itself could have issues. It is a combination of lowest order chiral perturbation theory combined with a plausible assumption, that the eta-prime correlator is a bubble sum. Scaling with m 2 P S N c is actually scaling with respect to m P S /µ 2 0 where µ 0 is the eta-prime mass, combined with scaling of µ 2 0 ∝ 1/N c as expected from the Witten-Veneziano relation [39,40]. QCD at intermediate to large sea quark mass does not have to be described by chiral perturbation theory.

IV. CONCLUSIONS
This pilot study shows that fermions influence the topological susceptibility through the product N c m 2 P S . Perhaps it is not a surprising result, but it does illustrate that there are quantities whose N c and fermion mass dependence is non-factorizing.
The other non-factorizing dependence (∝ m 2 P S /N c ) may be more ubiquitous. It appears in all chiral logarithm corrections. High quality data for the topological susceptibility would most likely observe it in the one loop [41] (and beyond) corrections to χ T in the chiral limit.
Probably the easiest place to see this generic behavior is in the dependence of t 0 on the pseudoscalar mass, as shown in Fig. 3. Scaling as N c m q is expected for observables in the epsilon regime (the limit of simulation volume V = L 4 and pseudoscalar mass where m P S L ≪ 1 while m H L > 1 for all other mass scales m H ). It appears in predictions for chiral observables such as the finite-volume condensate which involve the scaling combination m q ΣV (for example, Σ(V ) = m q ΣV f (m q ΣV )). I do not know of any Monte Carlo checks of this scaling. This is a pilot study: what would it take to produce higher quality data? This presumably means larger volumes, several lattice spacings, and maybe larger N c . Larger volumes are needed to push to smaller fermion mass and check for m 2 P S N c scaling in a theoretically clean regime. Several lattice spacings are needed, of course, to give a continuum result. Such data sets already exist for N c = 3, and the only reason to repeat them is to use them as checks of the methodology for the more interesting larger N c cases.
I suspect that such N c = 4 data sets could be generated with the same techniques as either I or (better) Ref. [30] used, simply consuming more resources. (Neglecting autocorrelation effects, the simulations are dominated by calculation of fermion propagators, involving matrix-times-vector operations; the scaling is roughly N 2 c .) My experiences with N c = 5 raise a flag, however. The long autocorrelation time for N c = 5 compared to lower N c values is a clear issue. It is hard to imagine the N c > 5 will have a shorter autocorrelation time. Of course, I should not say more: I have not tried to do extensive running for N c > 5 with dynamical fermions at the same lattice spacing as the data presented here. But if I were to keep going with this project, I think I would adopt the open boundary conditions used by Ref. [30] to try to shorten the autocorrelation time.
Another "pilot area" would be to move away from N f = 2. For N c =3, this is reasonably well explored by simulations with up, down, and strange quarks, and a recent study by Nogradi and Szikszai [42] covers N f = 2 − 6. These are all tests at low quark mass: what happens as the mass grows? Varying N c and N f together would allow tests of the Veneziano limit [43,44], N c → ∞ at fixed N f /N c . Is there a universal curve for χ T (m 2 P S ) across a wide range of N c and N f , with a scaling variable just by a coupling with the dimensions of a squared mass. The hairpin graph is analyzed as if each of its quark loops is a propagator for an ordinary pseudoscalar Goldstone meson. That is, the momentum space amplitude for the connected graph (the first term in Fig. 15) is while the hairpin amplitude involving a single flavor is In these expressions, f P = 0|ψγ 5 ψ|P S = √ 2m 2 P S f P S /(2m q ) from the PCAC relation. (Here f P S = 93 MeV.) The quantity µ 2 0 which couples the fermion loops is the squared mass of the "quenched approximation eta-prime" in the chiral limit. (The factor 1/N f converts the single-flavor graph into the expectation of the eta-prime mass in N f -flavor QCD, since each closed loop has a multiplicity of N f , and the wave function (vertex) is scaled by a factor of 1/ N f .) In full QCD the correlator which gives the mass of the isosinglet meson is the difference C(t) − N f H f ull (t), and H(t) is supposed to represent the first term in a geometric series, the rest of the terms in Fig. 15. This series sums up to shifting the squared mass of the pseudoscalar meson from m 2 P S to m 2 P S + µ 2 0 . Computing the quenched susceptibility directly from Eq.
In full QCD, with dynamical fermions, Eq. (A2) gives the quenched topological susceptibility χ Q . In full QCD, the hairpin is still saturated by zero modes, but Eq. A5 (evaluated at q 2 = 0) says χ m 2 Substituting for the condensate via m 2 P S f 2 P S = 2m q Σ, recalling m 2 η = µ 2 0 + m 2 P S , and using the Witten-Veneziano relation to replace µ 2 0 by χ Q , we find This interpolates between the small-m q suppression and the quenched result.