Complex Langevin and boundary terms

As is well known the Complex Langevin (CL) method sometimes fails to converge or converges to the wrong limit. We identified one reason for this long ago: insufficient decay of the probability density either near infinity or near poles of the drift, leading to boundary terms that spoil the formal argument for correctness. To gain a deeper understanding of this phenomenon, we analyze the emergence of such boundary terms thoroughly in a simple model, where analytic results can be compared with numerics. We also show how some simple modification stabilizes the CL process in such a way that it can produce results agreeing with direct integration. Besides explicitly demonstrating the connection between boundary terms and correct convergence our analysis also suggests a correctness criterion which could be applied in realistic lattice simulations.


I. INTRODUCTION
It has been known for a long time that the Complex Langevin (CL) method for simulating systems with complex action may fail by either not converging or by converging to the wrong limit. These failures were traced either to insufficient decay of the probability distribution in the complexified configuration space [1,2] (at infinity or at poles of the drift force), or to failure of ergodicity [3,4]. Recently Salcedo [5] has formulated interesting criteria for failure that at first sight seem to be unrelated to the ones identified by us. The most interesting ones derive support properties of the equilibrium measure which are shown to be in conflict with the correct expectation values.
In this note we will focus on one such example and show explicitly that the problems are due to slow decay, leading to the appearance of boundary terms in an integration by parts, spoiling the formal proof of correctness. We stress that we are here concerned with the behaviour at large non-compact dimensions. The effects of nonholomorphicity have been shown, e.g. in RM models [6,7] to lead to wrong convergence and were specifically addressed in [3] both in simple models and in QCD.
Here we consider a complex density periodic with period 2π and extending to an entire analytic function without zeroes.
The complex Langevin equation (CLE) in the form used here is where dw is the Wiener process normalized as and the drift is given by The long time asymptotic average of a generic observable O is denoted by O ∞ ; we say that the CL process yields correct results if this agrees with the 'correct' expectation value of the same observable defined as i. e.
In [1,2] correctness was derived from the consideration of CL expectation values at finite Langevin time; it was shown that correctness is assured if a certain quantity F O (t, τ ) is independent of an interpolation parameter τ ∈ [0, t], i. e.
Here F O (t, τ ) interpolates between the 'correct' time evolution F O (t, t) (defined in Section III such that lim t→∞ The vanishing of this quantity can be traced back to the vanishing of a boundary term arising under integration by parts; explicitly this term is where P (x, y; t) gives the time evolved probability density under the Langevin evolution and O(z; t) gives the 'correctly evolved' observable (see appendix A).
This form of the boundary term makes clear that correctness requires sufficient decay of the product K y P O.

II. THE MODEL
The model studied here is defined by the complex density and has been studied already in 2007 by Stamatescu [8] and in 2008 by Berges and Sexty [9]. The 'correct' expectation values of exponentials ("modes") are It was found in [8,9] that the CL process does not reproduce the correct EV's which, however, can be regained by a certain reweighting procedure (with different observables requiring sometimes different reweightings).
The remarkable fact found by Salcedo [5] is that the static probability distribution P (x, y) = P (x, y; ∞) for this model can be written down explicitly by solving the time independent Fokker-Planck equation (FPE); it is It is the only non-Gaussian example known to us for which a solution of the static FPE is known in analytic form. Three features of this solution are remarkable: (1) P is independent of x, (2) P is independent of β, (3) P decays as exp(−2|y|) for large |y|; this decay is not sufficient to make the integrals of the modes exp(ik(x + iy)), |k| ≥ 2 (14) absolutely convergent, in other words, already here we are faced with slow decay.

A. Complex Langevin results
But first let us demonstrate that (13) is indeed the distribution produced by running a CL simulation for a long time. The drift force is for the Langevin process Eq. (2).
In Fig. 1 we show the histogram of the converged marginal distribution P y (y; t) = dxP (x, y; t) in log scale for β = 1, overlaid with the distribution (13). The histogram is obtained from one long trajectory (Langevin time t ≈ 125000). The agreement over about 6 orders of magnitude is convincing. The distibution can also be seen to be independent of x, cf. also Fig. 8.
We also show in Fig. 2 the histograms of P y (y; t) for various shorter times and β = 0.1, illustrating the convergence as t → ∞. As noted by Salcedo [5], it is obvious that the distribution P (13) cannot reproduce the cor-rect expectation values Eq. 12 of the observables O k = exp(ikx), because it is independent of x, entailing dx P (x, y)e ±ikx = 0 , and its slow decay makes the expectation values of O k ill-defined for |k| ≥ 2. In Table I we collect a few CL results, together with the exact expectation values determined by ρ for β = 1. The simulation used 100 independent trajectories with randomly chosen starting points, running for a Langevin time of t = 1250. The CL values for |k| > 2 are completely submerged by noise, as expected. For k = ±2, even though the results are probably not sufficiently well converged, a value near 1 is suggested.
The Schwinger-Dyson equation (SDE) arising from the identity would be satisfied for k = 0, ±1 if the modes ±1 are 0 and the modes ±2 are 1, even though these values are not the 'correct' ones. So the CL results, where they are defined, are incorrect, but mostly -for |k| ≥ 3, they are completely undefined due to uncontrollable fluctuations.
The last row in the table gives the correct results from the correctly evolved observables, as will be explained in the next section. Notice that the correct results of course also satisfy the SDE, but these equations, having the structure of a two-step recursion, do not have a unique solution [2,10,11].

B. A puzzle
The remaining question is: how can CL fail for the first mode, i. e. observables O ±1 ≡ exp(±i(x + iy))? O ±1 P as well as KP decay exponentially in y.
Actually the densities of O ±1 P and K, if considered not as functions of y, but as functions of their actual value decay only power-like (see Eq. 38). Nagata et al [12] formulated a criterion stating that exponential decay of the distribution of K is necessary to guarantee correctness; so by this criterion correctness is not to be expected here. We will formulate, however, a more precise criterion in Section III.
The CL simulation produces for O ±1 well converged, yet incorrect results, close to 0 (consistent with (13) but inconsistent with (12)).
The resolution lies in the nonvanishing boundary terms arising in the time dependent expectation values and persisting for arbitrarily large times; this is the mechanism described in [1,2]. In the following section we will analyze those boundary terms in detail.

III. BOUNDARY TERMS FOR FINITE LANGEVIN TIME
The formal argument for correctness [1,2] is revisited in Appendix A. It requires the choice of an initial distribution P (x, y; 0); in the following we will choose for simplicity The identity (7) follows by integrating by parts, assuming that there are no boundary terms, and using the Cauchy-Riemann equations.
In order to check for the appearance of boundary terms as in (10), we need the time evolutions of both the observables (the 'correct evolution', see Appendix B) and of the probability density P by solving the FPE with the initial condition (19).

A. Indirect evidence for boundary terms
In [2] we found numerically for a somewhat different model that the correctly evolved observables O(x + iy; t) grow in the y direction as an iterated exponential. The same can be seen here, but we will not go into this. This growth makes the appearance of boundary terms already plausible.
In the following we show explicitly that Eq. (7) is numerically satisfied for short times (up to t ≈ 20), choosing β = 0.1.
The 'correct evolution' or L c evolution is defined by the differential equation We compare (see (34) for the definition of Here P (x, y; t) is the solution of the real Fokker-Planck equation (FPE) with and initial condition (19), which describes the time evolution of the probability density under the CL process. O P (t) is the t−dependent expectation value obtained in the CL process.
For our model the FPE is (24) with the drift force (15). Eq.(24) is solved numerically as well; some details are found in Appendix C.
It is seen that the left hand side (22) reaches its asymptotic value already for quite short Langevin times (around t ≈ 7). This is in accordance with the value of the smallest nonzero eigenvalue λ 1 ≈ −1 of L c (cf. Eq.(B6)). For this value of β also the right hand side does the same; for t 20 there is no difference visible between the left and the right hand sides (dashed and solid curve). This indicates that any boundary terms are negligible there.
So there is a "plateau" corresponding to the correct value in the solid curve, and the boundary term starts picking up around t = 20. In Fig. 4 we show (in black) the evolution of the first mode up to time t = 200. It is seen that after the plateau it converges to zero, the value corresponding to the stationary solution (13) of the FPE. We will return to this figure in Section 5.

B. Direct study of the boundary terms
We next study explicitly the evolution of the boundary term Eq. (10) for the mode k = 1 and τ = 0 with Langevin time t. As explained in Appendix A the definition of this term implies a certain order of limits: Integrate by parts restricted to |y| ≤ Y , send t → ∞ and then Y → ∞ (notice that this does not require a separate simulation but a certain processing of the data). We obtain for our model: We first note that we can take the limit t → ∞ of this expression, using the fact that P (x, y; t) indeed converges to Eq. (13), which was verified before. We obtain (converging to −iβ/2 for Y → ∞).
In Fig. 5 we compare the numerical results for Langevin times up to t = 200 with the asymptotic value at t = ∞ for β = 0.1. Y was chosen to be 5 which is close to the asymptotic value Y = ∞ (tanh(5.) = 0.99991). We see here directly that the boundary term stays very small up to t 20, then picks up and approaches the analytically determined value −iβ/2. For the value β = 0.1 it also follows closely the difference between the first mode shown in Fig. 4 and the correct value, but this cannot remain true for larger β. In Fig. 6 we also show the boundary term for different values of the cutoff Y , showing the fast approach to the asymptotic value. Note that in the right panel we show the boundary term as measured using the CLE alone, without making use of the Fokker-Planck evolution, which would be prohibitively costly in a lattice model. So we established implicitly and explicitly that boundary terms appearing appreciably only after some Langevin time are reponsible for invalidating the formal correctness argument. the presence of the observable O 1 is essential; the distribution of the drift force alone goes to zero. Quite generally it is the product of observable, drift and probability P that decides about the presence or absence of boundary terms; this indicates some possible limitations of the criterion for correctness advocated in [12].

C. Boundary terms and skirts
Thinking now of Y not as a cutoff, but as a variable, and denoting it by y again, we see that the the first term of the boundary term B 1 (y; ∞, 0), considered as a function of y Eq.
On the other hand the distributions of v itself has a density p(v), related to P by This can easily be worked out, but the point is that for large y which shows that there is no finite expectation value of v since v p(v) is not integrable.
In other words: a 'skirt' in the distribution of K y O falling off like the power −2 or more slowly corresponds to a nonvanishing (possibly diverging) boundary term. Note, however, such a simple reasoning is only possible because here P is independent of x.

D. The interpolating function
So far we have only compared F k (t, 0) and F k (t, t). But it is instructive also to look at the interpolating function F k (t, τ ) which should be independent of τ for the correctness argument to hold. This is shown in Fig. 7 for k = 1 for β = 0.1 and β = 0.5.
Again it is seen that for β = 0.1 t 20 the curves are flat,indicating the absence of any appreciable boundary terms. For t > 20 a τ dependence develops, being maximal at τ = 0. This is understandable from what we have seen: the FPE evolution of P proceeds up to time t − τ , which allows for the boundary terms to arise. On the other hand, for τ 7 O k (z; τ ) has practically reached its asymptotic limit (cf. Appendix B), in which only the constant mode survives; this constant can be pulled outside the integral defining F , so that for t, τ > 7 (36) i.e. the correct value (where we used the fact that the density P is always normalized).
At small t, τ flat curves for F 1 (t, τ ) indicate that CL gives the correct values, however these are dependent on the initial condition if the process did not yet thermalize. This is seen in Fig. 7, bottom plot, for β = 0.1.
Notice that the slope of F k (t, τ ) appears maximal at τ = 0 for large t. Therefore the estimation of noxious boundary terms as defined in (27) is relevant for judging the asymptotic correctness of the CL procedure -cf. Figs. 5, 7.
Plots similar to Fig. 7 appeared in [2] for a different model.

E. Evolution of some marginal distributions
For β = 0.1 we saw clearly the evolution first apparently converging to the correct value and then departing from it (the 'plateau' in Fig. 3). Similar behavior in Langevin time was observed in a real time SU (2) lattice simulation [10]. This is also reflected in some marginal distributions. In Fig. 8 we show the evolution of P x (x; t) = dyP (x, y; t) for β = 0.1. It starts out flat, corresponding to our choice of initial condition; at t = 10 and t = 20 it shows maximal structure, while for larger t it approaches a flat distribution again, in agreement with (13).
The distribution of the first mode also show a similar behavior. Of interest is the imaginary part. Its density is σ(u; t) ≡ dxdyP (x, y; t)δ(sin(x)e −y − u) .
(37) We present in Fig. 9 histograms for σ(u; t), obtained from the numerical solution of the FPE; for the limiting distribution P (x, y; ∞) = 1/(4π cosh 2 (y) we can evaluate (37) analytically:  Fig. 9 shows first the development of an asymmetric structure with two maxima, whereas for larger t one sees clearly the approach to the symmetric analytic result (38).

IV. ILLUSTRATION OF THE EFFECT OF THE BOUNDARY TERMS IN A REGULARIZED MODEL
In the preceding section we described how the boundary terms accumulate in the Langevin (and Fokker-Plank) evolution, spoiling the proof of convergence such that the process would lead to wrong results.
Here we want to explicitly see the effect of those terms by considering a "regularization" of the model using a damping term in the action, S R = s 2 x 2 , which leads to a modification of the drift by K R (z) = −s z (a similar regularization has been used in [13]; we thank J. Drut and A. C. Loheac for making us aware of this). The philosophy of this regularization is very similar to that of dynamical stabilization [14]. In both cases, and different from modifications using symmetries, such as in the gauge cooling paradigm, the dynamics is really changed, but in a way intended to be controllable.
For s = 0 we regain the original model Eq. (11) (including its problems) while for s > 0 we should observe an interplay between the original tendency to build boundary terms and their damping in the modified model, allowing us to estimate the effect of these terms. This particular modification leads to loss of periodicity in x which becomes noncompact at s > 0. The CLE process was allowed to drift unbounded in the full z plane and the exact integral was correspondingly done in the infinite interval. The following plots show O 1 = Im e iz . We see from Fig. 10 that the regularization stabilises the expectation values in the CLE evolution. When the non-regularized data show a plateau at the correct value for intermediary t the regularization extends this plateau into the asymptotics (β = 0.1 case). When a plateau is missing the regularization still stabilises the EV but at a value shifted from the correct one (β = 0.5 case), since now a larger s is needed to counteract the boundary terms.
Note that an alternative regularization of the process itself is to modify only the imaginary drift by a damping term K R,y = −s y. This leads to similar results (see Fig. 4, here from the FPE evolution), and has the advantage that periodicity in x is preserved. We preferred the action variant, however, also since it allows us to obtain exact correct results for the regularized model by simple numerical integration.
In Fig. 11 we show the s dependence in CLE for The plots suggest an extrapolation toward the exact expectation value (EV) for s → 0, however this might not be simply linear, but depend on the particular regularization, β, etc. Therefore we mean this discussion not yet as a direct cure but mainly as illustration of the effects of the boundary terms on the EV's. For k = 1, e.g., these effects can be estimated from the distance between the CLE regularized data and the exact values from the numerical integration of the regularized model: As can be seen from the figure at small s these effects are still present while gradually vanishing with increasing s.

V. CONCLUSIONS
We have in great detail analyzed a simple example in which the CL fails, establishing very explicitly that the failure is due to boundary terms spoiling the correctness argument, as argued already long ago [1,2]. The absence of such boundary terms requires that the product of observable, drift force and probability distribution (OKP ) goes to zero in the noncompect (imaginary) directions. The relation between boundary terms and 'skirts', i. e. decay of distributions was addressed in Subsection III C, making clear that possible skirts in the distribution of the product OKP and not just KP are relevant, so the general validity of the criterion proposed by [12] and not involving the observables is not obvious.
Generally the τ −dependent boundary term (A16) cannot be estimated in a realistic (lattice) calculation. Fortunately, however, the considerations in this paper suggest that relevant for the correctness of the asymptotic (large t) EV's is the boundary term at τ = 0, B k (t, 0), as defined in (27). This term appears to maximize B k (t, τ ) and it stabilizes at large t; it is accessible in principle to online monitoring using the CLE alone, and may provide a correctness criterion for the EV's obtained in the CL simulation.
As remarked before, uncovering the boundary term requiress a certain processing of the data obtained in the simulation, implying essentially sampling first at a fixed value of a quantity specifying the boundary in the non-compact directions (in a lattice gauge theory for instance the unitarity norm or some other related quantity) before taking the other limits. This however does not require a separate simulation. We also consider the time evolution of the complex density ρ(x; t) by the complex Fokker-Planck equation where now the complex Fokker-Planck operator L T c is The initial conditions for (24) and (A1) are required to be consistent, i.e.
The crucial point is that one can now, under conditions to be spelled out below, show that dxO(x)ρ(x; t) = dxdyO(x + iy)P (x, y; t) .
By our choice of initial conditions, (A4) holds for t = 0. For t > 0 we consider F O (t, τ ) defined in Eq. (34), which interpolates between the two sides of (A4): We call the solution of Eq. (20) the "L c evolved" or "correctly evolved" observable.
The interpolating property follows from see Section III A, Eqs. (22) and (23). The first equality is obvious, the second one follows by integration by parts in x; because of periodicity there are no boundary terms. (A4) would follow if we could prove This would again follow using integration by parts, provided there are no boundary terms. For the term ∂ 2 x of both L T and L c this is obvious because of periodicity, so we can drop these terms, obtaining In [1] we argued that O(x + iy; τ ) is holomorphic for any τ , i.e. it obeys the Cauchy Riemann equations This allows us to write the second term of the right hand side of (A12) as (A14) Again the part involving ∂ x can be canceled against the corresponding term in the first term of (A14) using integration by parts in x, so we only have to consider We have to interprete this as a the limit Y → ∞ of the integral restricted to |y| ≤ Y . For finite Y (A15), using integration by parts, this is given by the boundary term Evaluating this term at τ = 0 leads then for our model to (27) where we can then take the large t limit to obtain for our model 28. This form of the boundary term makes clear that correctness requires sufficient decay of the products K y P O.
Notice that if we take the t → ∞ limit directly in (A15) the first term vanishes by stationarity and the second one leads to the 'Correctness Conditions' (CC) defined in [2] and is approximately zero by the SDE. Hence it might appear that the boundary term vanishes and we might erroneously conclude correctness of the results. Therefore the CC, while expressing convergence and being necessary for correctness, are not sufficient.

Appendix B: The correct evolution
We rewrite the Langevin operator L c in the basis of Fourier modes: or equivalently, for a general observable So L c is represented on the Fourier coefficients by the sparse infinite matrix with elements It is easy to compute numerically the action of exp(tL c ) on observables of the form O k = exp(ikx); cutting off the modes at |k| ≥ K with K = 50 and K = 150, and for t = 50, gave identical results, with only the constant mode surviving. Its value agrees to at least 5 digits with i.e. the correct expectation value.
We also checked, using Mathematica, that the eigenvalues of the truncated matrix (L c ) kl have negative real part except for the unique zero eigenvalue corresponding to a k ∝ δ k0 . All nonzero eigenvalues are real and doubly degenerate. The one with the smallest modulus determines the approach to the infinite time limit; it depends only weakly on β, e. g.
Remark: It is easy to show that by a similarity transformation L c can be transformed into the dissipative operator Dissipativity means −H −H * ≤ 0, which is obvious. For such operators general theorems guarantee that the spectrum is contained in the left half of the complex plane (see for instance [15]). It is also not hard to see that there is exactly one vector with eigenvalue zero.
We solved the Fokker-Planck equation on an x-y-grid with parameters dt = 10 −6 , dx = 0.005 = dy, a cutoff in y-direction of Y = 5 was found to be sufficient (compare (28), tanh(5) ≈ tanh(∞)), and a cutoff in x-direction of X = 3.14, which is due to the 2π periodicity of the problem. Boundary conditions in x and y were both chosen to be periodic. Initial condition were chosen according to (19), however the δ-function was smeared out slighty to avoid numerical issues; so we actually used P (x, y; 0) = 1 2π 2πσ 2 y e − y 2 2σ 2 where we chose σ y = 0.1. Note that using this discretization it is hard to resolve the higher modes. This can be done more easily when solving the Fokker-Planck equation in Fourier space, where it is given by P (k, y; t + dt) = −k 2 P (k, y; t) − iβ 2 sinh(y) (k − P (k + , y; t) + k + P (k − , y; t)) + iβ 4dy cosh(y) (P (k + , y + ; t) − P (k − , y + ; t)) + iβ 4dy cosh(y) (−P (k + , y − ; t) + P (k − , y − ; t)) , where k ± = k ± 1 and similarly for y. Here we chose dt = 0.5 × 10 −5 , k ∈ {−19, . . . , 20}, dy = √ dt, Y ≈ 2.8 with antiperiodic boundary conditions in k for the imaginary part of P (k, y; t) and periodic boundary conditions for the real part in k and for y. After t ∼ 30 or so the result strongly depends on the choice of discretization. Hence we use the k-y discretization to resolve the plateaus in the higher modes and the x-y-discretization for everything else. The solution to the Fokker-Planck equation shows that for β = 0.1 the evolution initially follows the correct evolution. This is suggested by Fig. 7 and 5. By looking at P (x, y; t) in Fig. 2 and 8 and the histogram of the first mode in Fig. 9, one can see that initially nontrivial structures occur. Those die out and everything approaches the asymptotic solution, which yields the wrong results. This strengthens the argument that until t ∼ 20 or so CLE yields the correct solution but then the occurence of boundary terms leads to wrong convergence.