The Ostrogradsky instability can be overcome by quantum physics

In theories with higher time derivatives, the Hamiltonian analysis of Ostrogradsky predicts an instability. However, this Hamiltonian treatment does not correspond the way that these theories are treated in quantum field theory, and the instability may be avoided in at least some cases. We present a very simple model which illustrates these features.


INTRODUCTION
In 1850, Ostrogradsky analysed Lagrangians which contained higher time derivatives and showed that such theories are classically unstable [1,2]. Classical instability need not imply quantum instability. A counter-example is the theory of the Dirac equation, where the classical Hamiltonian has unbounded negative energies while the quantized Dirac field is stable with positive energies. For higher derivative theories there remains much debate about stability, positive energies, unitarity and causality. We here present a simple model representative of a class of theories and show that the Ostrogradsy instability is not present, as well as showing that the massive excitation carries positive energy and that the classical limit is normal. We have elsewhere demonstrated unitarity for this class of theories [3], and have also discussed causality [4][5][6]. We also review these elements below.
The work on quantum field theories with higher derivatives goes back to Lee and Wick [7][8][9] in the 1960's and our discussion incorporates elements of this past work. Much of the present interest arises from quadratic gravity, a renormalizeable theory where the Lagrangian contains squares of the curvatures as well as the Einstein term linear in the curvature [10][11][12][13][14][15][16][17][18][19][20][21][22][23]. Because the curvatures tensors are second order in the derivatives of the metric, this becomes a theory with four derivatives in the Lagrangian. Quadratic gravity falls in the category which we are discussing here. However, it is a far more complicated theory. We hope that our simple model here illustrates the key features for this class of theories.

THE MODEL
Consider a "normal" theory (i.e. without higher derivatives) of a complex scalar field χ coupled to a real scalar φ with the Lagrangian with The relative masses here are not particulary important, but we will present results for m 2 < m 2 χ . While there is no symmetry which can force m = 0, we can nevertheless envision a situation where the renormalized mass of the φ field is negligibly small. This then would be a scalar model for a charged field χ interacting with a scalar photon φ. The exchange of the φ would yield a Coulomb -like potential. If the mass were zero, the φ field would satisfy the classical wave equation. So we invite the reader to consider this as a scalar model for QED, or perhaps as a scalar model for the gravitational interaction of a massive χ field. Note that as a writing style, we will refer to theories with only two time derivatives as "normal", so that we do not have to regularly specify that there are only two derivatives.
Into this bucolic setting we now introduce a troublesome term with higher derivatives with Here M is a large mass, very much larger than m and m χ . We invite the reader to consider M as the Planck massfar beyond the range of ordinary experiments. We know from work on effective field theories that if we treat this new term as a perturbation, it would have a negligible effect at low energies, and would not change the classical wave equation if m = 0. However if we treat this as a fundamental theory, the analysis of Ostrogradsky would say that the theory has an instability which renders the theory non-viable even at low energy. This is the simple model which we wish to analyze.
The first path integral, over a(x), is just the original normal theory, with φ(x) replaced by a(x). The second path integral, over η(x), is the complex conjugate of a normal massive theory. The remnant of the original higher derivative term is the −i instead of +i in the second path integral.
We defer any interpretation to later. For now, let us just calculate the path integral over η. This is a Gaussian integral and is perfectly well defined. The result is simply the complex conjugate of the usual Gaussian integral. To see this explicitly, we add an real infinitesimal factor of − d 4 xφ 2 to the exponent to make the result well behaved for large fields. Then we complete the square using with This propagator is the complex conjugate of the usual Feynman propagator, changing the sign in the numerator and also the sign of the i term in the denominator. The integral over η yields At low energy, the interaction becomes local and we obtain This is just a shift in the quartic interaction of the χ field, with The minus sign in the new contribution is the remnant of the use of exp(−iS) in the path integral. However, for a large mass M , this will not change the sign of λ . The low energy limit of this theory, quantized using path integrals, is then perfectly normal. The resulting classical theory for small or vanishing m is then also unchanged. We colloquially refer to the classical limit as taking → 0. However in fact is a fixed constant, and the classical regime is that with kinematics such that effects are unimportant. We will see that at high energy and short wavelengths effects are crucial. In this theory the classical limit involves wavelengths much larger than the Compton wavelength of the χ field, much like in usual QED.
The most appropriate interpretation of the η path integral is as the time-reversed version of a regular path integral. Time-reversal is an anti-unitary operation, involving complex conjugation. The Lagrangian itself is time-reversal invariant but in the path integral exp(iS) changes to exp(−iS). Within the path integral, this change is manifest most importantly in the i in the propagators. These define the arrow of causality [4,5]-telling us what is the past lightcone and what is the future. We will see that when we decompose the propagator into time ordered factors, the usual i tells us that positive energies propagate forward in time. Changing the sign on i leads to propagation of positive energies backwards in time. This is described explicitly in the following section.

HIGH ENERGY
In this section, we show how the coupling to the χ fields makes the heavy particle decay, that positive energy is needed to excite this resonance and we further demonstrate the "backwards in time" behavior of the resonance.
While one often starts the analysis of a theory in the free-field limit with no interactions, here it is important to include the effect of interactions in order to properly understand the spectrum of the theory. In this regard, it is more similar to the analysis of the electroweak theory, where the interaction with the Higgs boson is included from the start in order to get the spectrum correct. In our case here, the coupling to the χ fields is required to provide information on the decay width which is crucial for understanding the spectrum.
Consider the φ propagator in the original basis, before any field redefinitions have been performed. Including the vacuum polarization, this has the form The one loop vacuum polarization has a divergent piece which goes into the renormalization of m 2 . As noted above for convenience we will choose the renormalized value of m 2 to vanish, in which case the finite part of the vacuum polarization is Beyond q 2 = 4m 2 χ there will be an imaginary part of the vacuum polarization, which for our purposes is the most important feature. At high q 2 , where we apply this, we have the result such that With this result we can look for the high mass pole. It is found at where the real part of the mass is found to bē to first order in g 2 . In the neighborhood of this pole we use q 2 =M 2 + (q 2 −M 2 ) to find the approximate form The important thing to notice here is that there are two minus sign differences from a normal resonance. The −i in the numerator and the −iγ in the denominator are both of opposite signs from usual resonances. These combined sign differences will lead to the eventual identification of this as the time-reversed version of a usual propagator.
We can see that this propagator corresponds to exponential decay rather than exponential growth by writing it in time ordered form, with x 0 = t. The poles in the complex q 0 plane are shown in Fig. 1. There is the massless pole at There are massive poles at or with E q = q 2 + m 2 r . When t > 0, we close the contour in the lower half plane. This yields the forward propagator which shows the decaying exponential for the massive term, with the identification The term describing propagation backwards in time is obtained for t < 0 by closing in the upper half plane, with the result Again we see exponential decay. The other notable feature is that the direction of energy flow is reversed for the high mass resonance. Whereas the normal massless pole propagates positive energy forward in time, the high mass resonance propagates it backwards in time.
We have elsewhere proposed calling the high mass resonance in this type of theory a Merlin mode [4], named after the wizard in the Arthurian tales who ages backwards in time. This distinguishes it from the more generic phrasing of "ghost", which is applied to any field with the minus sign in the numerator of the propagator. For example, Faddeev-Popov ghosts have a negative sign in the numerator but carry the usual i in the denominator. Here there is the extra change of sign in the denominator −iγ, which is crucial in making this propagator the time-reversed version of a usual resonance propagator. This interpretation is reinforced by calculating the Green function with retarded boundary conditions. The loop integrals for the χ fields going into the vacuum polarization need to be calculated using the in-in formalism, as in Ref. [24]. The result is the same functional dependence, but with a different i prescription, such that the logarithm is This shifts the location of the poles to the positions indicated in Fig. 2. For t > 0 we pick up the usual massless poles However, even with these boundary conditions, the Merlin resonance gives a contribution for t < 0, .
This also contains decaying exponentials. If we choose to use this as a Green function giving the response to an external source, it would correspond to the propagation of the effect backwards in time. This is related to the microcausality violation on scales of order of the resonance width. The width corresponds to the decay into two on-shell χ particles. These carry positive energy, so that the resonance also corresponds to positive energy. We can also see that this resonance requires positive energy in order to be produced by the same reasoning. It is seen as an s-channel resonance in χχ → χχ. The amplitude for the process is When squared |M| 2 has the same form as a usual resonance, so this yields the characteristic Breit-Wigner shape. The incoming χ fields carry positive energy and one needs a large positive energy to produce the resonance. With higher derivative theories, there is no guarantee that all quantization methods will yield the same result. The equivalence of various approaches to quantization has been demonstrated only for normal theories. We have used path integral quantization because it is exceptionally clear in this case. However, there are four canonical quantization schemes which we know of which also yield positive energies for the ghost field [7,[25][26][27]. Each requires some modification to traditional canonical quantization. The earliest was due to Lee and Wick in the 1960's where they proposed a higher derivative theory for a finite version of QED [7][8][9] . Their approach was to treat the Pauli-Villars regulator as a dynamical field. The minus sign between the normal propagator and the Pauli-Villars field then becomes the essential complication. They used what they called an "indefinite metric" scheme, which modifies the canonical commutation relations. The result is a massive field with positive energy. While our path integral analysis does not rely on the specifics of any canonical quantization scheme, the fact that such schemes exist is welcome.
The high energy structure of this theory is intrinsically quantum. The decay width is crucial for understanding the nature of this resonance. Again we note that while we often refer to the classical world as taking → 0, in nature is a constant and the width is a quantum effect. While the exact magnitude of the width is not important, one cannot neglect its effect. Taking the → 0 version of the propagator functions is not physically sensible.

WHAT WOULD OSTROGRADSKY SAY?
The basic point to be noticed is that the Ostrogradsky construction has no resemblance to quantization via path integrals.
Ostrogradsky's analysis of the higher derivative Lagrangian of Eq. 4 starts by noting that with extra two time derivatives there are extra degrees of freedom associated with the Lagrangian, and this requires two canonical coordinates and the two associated canonical momentum. His choices for the coordinates are and for the momenta The Hamiltonian is formed by In writing this Hamiltonian, we must eliminateφ in terms of the canonical coordinates and momenta. This is accomplished by using the second line of Eq. 34 to writë The resulting Hamiltonian is The initial choices of coordinates are then compatible with Hamilton equationṡ and with some effort the Hamilton equationπ can be shown to be equivalent to the Euler-Lagrange equations. The Ostrogradsky instability is seen in the first term of the Hamiltonian of Eq. 37. The canonical momentum π 1 appears linearly, and there is no other factor of π 1 in the remainder of the Hamiltonian. This implies that the Hamiltonian is not positive definite, and there is no barrier to making the Hamiltonian negative. One does not need energies of order M in order to trigger the instability in this analysis.
To emphasize that the Ostrogradsky construction is not the one relevant for quantum physics, we present the following heuristic version of Hamiltonian quantization, related to the indefinite metric quantization schemes of Refs. [7,25,26]. We emphasize in advance that this presentation does not do justice to the care taken by those authors, but it does capture how quantization is different from the Ostrogradsky method. If one starts with the separated form for the Lagrangian given in Eq. 8 we would define the η canonical momentum by which has the opposite sign from usual. Imposing the equal-time quantization conditions then actually implies the negative of the usual rule, i.e.
(We note that, much like in the path integral analysis, this is the complex conjugate of the usual relation.) To solve this with the usual field decomposition, one would then apply the negative of the usual commutator for the creation operators, i.e.
If we do this, then the η Hamiltonian which emerges from the Lagrangian of Eq. 8, i.e actually has positive energy states defined from |p = a † |0 . While Refs. [7,25,26] provide more analysis to be convincing that such constructions are sensible, they do involve changing the commutation relations. As far as Ostrogradsky is concerned however, the main thing to note is that the choice of coordinates and momenta is different. The distinction is in the use of Hamilton's equations. Ostrogradsky has chosen the coordinates to reproduce Hamilton's equations. Quantum physics does not require these. The quantum choice of coordinates is made to produce positive energy states -i.e. quanta. The two choices lead to different Hamiltonians. We have seen that the Ostrogradsky assignment of φ i and π i is not equivalent to the path integral construction of the quantum theory, or even to the canonical methods such as the Lee-Wick indefinite metric quantization. Quantum physics does not use the Ostrogradsky Hamiltonian as it starting point.

CAUSALITY AND UNITARITY
Despite our analysis of the disconnect between Ostrogradsky and quantum phyics, we know that something has to go wrong in higher derivative theories. Axiomatic field theorists tell us that propagators cannot fall faster than 1/k 2 . This follows from the Kallen-Lehmann representation [28,29] where ρ(s) is a positive definite spectral function. If ρ is never negative, the high energy limit has the form If ρ is positive definite, this can never vanish. In higher derivative theories the propagator falls asymptotically as 1/k 4 . Therefore at least one of the axioms which goes into this theorem must be violated. For our simple theory, the defect is in microcausality. It has been known since the time of Lee-Wick and Coleman [30] that these theories violate causality. For a clear modern exposition, see the work of Grinstein, O'Connell and Wise [31]. The violation is evident from the factor of −iγ in the propagator for the Merlin mode, and from the interpretation of this mode as propagating backwards in time. We have written sufficiently on this topic elsewhere [4][5][6] about this feature that we do not need to repeat that analysis here.
However, unitarity survives intact. We have presented a formal proof of this elsewhere [3] (see also Lee and Wick [7]). However the rationale is quite simple to understand. Veltman [32] has shown that the states which appear in the unitarity sum are only the stable states of the theory. The unstable states of the theory are not included in the asymptotic spectrum. With the heavy unstable Merlin mode, the result is the same. The unitarity sum includes only the decay products, which in this case are the χ fields. One does not include the Merlin mode in the unitarity sum, and hence one is not bothered by the unusual minus signs which appear in the analysis of this state.
While there is no need to repeat the formal proof here, there is a simple and explicit example of how unitarity is manifest in a way which reflects on our treatment of the spectrum above. The s-channel reaction χχ → χχ excites the Merlin resonance. The S-wave partial wave amplitude is where D(s) is the propagator. Including the fact that near the resonance the width is given by this then has the near-resonance form which satisfies elastic unitarity Again it is the correlation between the two sign changes characteristic of the Merlin mode which allows unitarity to be satisfied. The asymptotic states here are the χ fields, as in the Veltman analysis.
However, the existence of the Merlin mode may require further changes to field theory practices at higher orders. For example, Lee and Wick first showed [7], and we confirmed in our analysis, that at the two loop order one needs a modification of the contour integral in order to reproduce the discontinuity which is calculated using the Cutkosky rules. The work of Grinstein, O'Connell and Wise [31] contains an explicit example of how this Lee-Wick contour works. At higher order there may be further modifications, or potential problems [33]. It remains for future investigations to better understand the field theory of these theories.

SUMMARY
We have displayed a simple higher derivative model whose path integral quantization avoids the Ostrogradsky instability. The main features are the positive energy of the states, and the decay of the ghost field which removes it from the asymptotic spectrum.
The toy model has features which tell us where to look for problems in higher derivative theories of a type which include both the toy model and quadratic gravity. For this class of theories, the problems are not negative energies, nor the Ostrogradsky instability, nor the classical limit. However, there still must be some problem, because such theories do not satisfy all the properties of standard quantum field theories. In this analysis, the problem is microcausality. The heavy ghost field becomes a Merlin mode with differing signs in the numerator and the denominator of the propagator. These two minus signs indicate that the propagation is the T-reversal of a usual resonance. This leads to "dueling arrows of causality", which appears to be the most unusual feature of such theories. If the Merlin particle is heavy enough, and its lifetime small enough, this appears to be compatible with experiment. Further work is needed to better understand the full quantum field theory of these theories. Perhaps a lattice simulation would be useful in order to provide a non-perturbative study. While higher derivative theories have unusual features, perhaps some of them can still lead to reasonable physical theories.
When working at low energy, heavy fields can be integrated out and we only need to work with those degrees of freedom which are active at the low energy scale. The resulting effective Lagrangian can be expanded in a derivative expansion, and so that the low energy world always contains higher derivative interactions describing interactions from the full theory which are suppressed by powers of the heavy masses. In this sense, all of our theories are effective field theories with higher derivative corrections at low enough energy. A simple example is the effective Lagrangian for the photon at energies well below the electron mass, having integrated out the electron via the vacuum polarization diagram. The result is The second interaction yields a contribution to the Lamb shift.
It is important to emphasize that higher derivatives in the effective Lagrangian do not cause any instability problems. Indeed the QED case is the "experimental" verification of this, as QED is perfectly stable at low energies. The Ostrogradsky analysis is not relevant for effective field theories. One quantizes the effective field theory using only the lowest order Lagrangian, and treats the higher order operator as a perturbative interaction. If one were to try to extrapolate the effective field theory beyond its range of validity, one would find that other physics is needed to describe accurately the higher energy theory. In the QED example, the form of the vacuum polarization changes to a logarithmic function at higher energy.
With this in mind, we can note that there is a variant on our simple model which is much closer to the modern application in quadratic gravity. This involves a derivative interaction, mimicking how gravity couples to matter proportional to the energy-momentum tensor, where κ is a coupling constant. Now the light φ field is protected from acquiring a mass by the shift symmetry φ → φ + c, and there will be a simple classical limit without having to tune the mass to zero. The analysis of the path integral of this model proceeds similarly to the presentation above, but now the effective low energy interaction from integrating out the heavy field η is proportional to κ 2 (2χ † χ) 2 /M 2 . Treated as an effective field theory, this suppressed interaction also does not upset the stability of the theory.