Field master equation theory of the self-excited Hawkes process

A field theoretical framework is developed for the Hawkes self-excited point process with arbitrary memory kernels. We derive the corresponding master equation for the general Hawkes process as a field theory of probability density functionals. The Langevin dynamics of the field variables is given by stochastic partial differential equations that are Markovian. This is in contrast to the Hawkes process, which is non-Markovian (in general) by construction as a result of its (long) memory kernel. For the case of a memory kernel decaying as a single exponential, we find the exact time-dependent and steady state solutions for the probability density function (PDF) of the Hawkes intensities, using the Laplace representation of the Master equation. For memory kernels represented as arbitrary sums of exponential (discrete and continuous sums), we derive the exact solutions of the Lagrange-Charpit equations for the hyperbolic master equations in the Laplace representation in the steady state, close to the critical point $n=1$ of the Hawkes process, where $n$ is the branching ratio. The critical condition of the original Hawkes process is found to correspond to a transcritical bifurcation in the Lagrange-Charpit equations associated with the master equations. We predict a power law scaling of the PDF of the intensities in an intermediate asymptotics regime, which crosses over to an asymptotic exponential function beyond a characteristics intensity that diverges as the critical condition is approached ($n \to 1$). The exponent of the PDF is non-universal and a function of the background intensity $\nu_0$ of the Hawkes intensity and of the parameter $\alpha = n \langle \tau \rangle$, where $\langle \tau \rangle$ is the first-order moment of the distribution of time scales of the memory function of the Hawkes process. Our theoretical predictions are confirmed by numerical simulations.


I. INTRODUCTION
The self-excited conditional Poisson process introduced by Hawkes [1][2][3] has progressively been adopted as a useful first-order model of intermittent processes with time (and space) clustering, such as those occurring in seismicity and financial markets.The Hawkes process was first used and extended in statistical seismology and remains probably the most successful parsimonious description of earthquake statistics [4][5][6][7][8][9].More recently, the Hawkes model has known a burst of interest in finance (see e.g.[10] for a short review) as it was realised that some of the stochastic processes in financial markets can be well represented by this class of models [11], for which the triggering and branching processes capture the herding nature of market participants (be they due to psychological or rational imitation of human traders or as a result of machine learning and adapting).In field of financial economics, the Hawkes process has been successfully involved in issues as diverse as estimating the volatility at the level of transaction data, estimating the market stability [11][12][13], accounting for systemic risk contagion, devising optimal execution strategies or capturing the dynamics of the full order book [14].Another domain of intense use of the Hawkes model and its many variations is found in the field of social dynamics on the Internet, including instant messaging and blogging such on Twitter [15] as well as the dynamics of book sales [16], video views [17], success of movies [18], and so on.
The present article, together with the joint-submission Letter [19], can be considered as a sequel complementing a series of papers devoted to the analysis of various statistical properties of the Hawkes process [20][21][22][23][24].These papers have extended the general theory of point processes [25] to obtain general results on the distributions of total number of events, total number of generations, and so on, in the limit of large time windows.Here, we consider the opposite limit of very small time windows, and characterise the distribution of "intensities", where the intensity ν(t) of the Hawkes process at time t is defined as the probability per unit time that an event occurs (more precisely, ν(t)dt is the probability that an event occurs between t and t + dt).We propose a novel natural formulation of the Hawkes process in the form of a field theory of probability density functionals taking the form of a field master equation.This formulation is found to be ideally suited to investigate the hitherto ignored properties of the distribution Hawkes intensities, which we analyse in depth in a series of increasingly sophisticated forms of the memory kernel characterising how past events influence the triggering of future events.The Hawkes process is the simplest self-excited point process, which describes with a linear intensity function ν how past events influence the triggering of future events.Its structure is particularly well-suited to address the general and important question occurring in many complex systems of disentangling the exogenous from the endogenous sources of observed activity.It has been and continues to be a very useful model in geophysical, social and financial systems.
The Hawkes process, as any other point process, deals with events ("points" along the time axis).The theory of point processes indeed considers events as being characterised by a time of occurrence but vanishing duration (the duration of events is very small compared to the inter-event times).Thus, to a given event i is associated a time t i of occurrence.In the case where one deals with spatial point processes, the event has also a position r i .And a "mark" m i can be included to describe the event's size, or its "fertility", i.e. the average number of events it can trigger directly.
The stochastic dynamics of the Hawkes process is defined as follows.Let us introduce a state variable ν, called the intensity.By convention, we denote stochastic variables with a hat symbol, such as Â, to distinguish them from the non-stochastic real numbers A, corresponding for instance to a specific realisation of the random variable.The intensity ν is a statistical measure of the frequency of events per unit time (i.e., a shock occurs during [t, t + dt) with the probability of νdt).In the Hawkes process, the intensity satisfies the following stochastic sum equation (see Fig. 1): where ν 0 is the background intensity, { ti } i represent the time series of events, and N (t) is the number of events during the interval [0, t) (called "counting process").One often refers to ν(t) as a conditional intensity in the sense that, conditional on the realised sequence of N (t) = k (with k ≥ 0) events, the probability that the (k + 1)th event occurs during [t, t + dt), such that tk+1 ∈ [t, t + dt), is given by ν(t)dt.The pulse (or memory) kernel h(t) represents the non-Markovian influence of a given event, and is non-negative definite.
The integral of h(t) defines the branching ratio The parameter n is a very fundamental quantity, which is the average number of events of first generation ("daughters") triggered by a given event [8,25].This definition results from the fact that the Hawkes model, as a consequence of the linear structure of its intensity (1), can be mapped exactly onto a branching process, making unambiguous the concept of generations: more precisely, a given realisation of the Hawkes process can be represented by the set of all possible tree combinations, each of them weighted by a certain probability derived from the intensity function [26].
The branching ratio is the control parameter separating three different regimes: (i) n < 1: subcritical; (ii) n = 1: critical and (iii) n > 1: super-critical or explosive (with a finite probability).The branching ratio n can be shown to be also the fraction of events that are endogenous, i.e., that have been triggered by previous events [27].
We now formulate the master equation for the model ( 1) and provide its asymptotic solution around the critical point.
B. The single exponential kernel case

Mapping to Markovian dynamics
Before developing the general framework for arbitrary memory kernel h(t), we consider the simplest case of an exponential memory kernel: The decay time τ quantifies how long an event can typically trigger events in the future.And we make explicit the branching ratio n from the normalisation of the exponential.This special case is Markovian as discussed in Refs.[28,29], because the lack of memory of exponential distributions ensures that the number of events after time t defines a Markov process in continuous time [30].
Let us now show that this single exponential case (3) can be mapped onto a stochastic differential equation (SDE) driven by a state-dependent Markovian Poisson noise.By decomposing the intensity as let us consider the Langevin dynamics with a state-dependent Poisson noise ξP ν with intensity given by ν = ẑ + ν 0 and initial condition ẑ(0) = 0.The introduction of ẑ is similar to the trick proposed in [31] for an efficient estimation of the maximum likelihood of the Hawkes process.By expressing the state-dependent Poisson noise as we obtain the formal solution of equation ( 5) This solution shows that the SDE ( 5) is equivalent to the Hawkes process (1).Equation ( 5) together with ( 6) is therefore a short hand notation for for the probabilistic time evolution during [t, t + dt) (see Fig. 2a for a schematic representation).Note that the event probability explicitly depends on ν(t), which reflects the endogenous nature of the Hawkes process.

Master equation
By introducing equation ( 5) together with (6), we have transformed a non-Markovian point process into a Markovian SDE.This allows us to derive the corresponding master equation for the probability density function (PDF) P t (z) of the excess intensity ẑ (4), with the boundary condition P t (z)dz is thus the probability that ẑ(t) takes a value in the interval ẑ(t) ∈ [z, z + dz) at time t.
The master equation ( 9) is derived as follows.Let us consider an arbitrary function f (ẑ).Using (8), its time evolution during [t, t + dt) is given by as schematically illustrated in Fig. 2b.By taking the ensemble average of both sides, we obtain This result (12) is obtained by (i) using the identity (ii) by performing a partial integration of the first term in the right-hand side of Eq. (12), and (iii) by introducing the change of variable z → z − n/τ for the second term.Since ( 12) is an identity holding for arbitrary f (z), the integrants of the lelf-hand-side and right-hand-side must be equal for arbitrary f (z), which yields the master equation (9).Note that the above derivation of the master equation is not restricted to the exponential shape of the memory kernel.We are going to use the same derivation in the more complex examples discussed below.
C. Discrete sum of exponential kernels

Mapping to Markovian dynamics
The above formulation can be readily generalized to the case of a memory kernel expressed as a discrete sum of exponential functions: In this case, each coefficient n k quantifies the contribution of the k-th exponential with memory length τ k to the branching ratio n = K k=1 n k (see definition (2)).We note that this representation ( 14) is quite general, as it can approximate well the case of a power-law kernel with cut-off up to a constant [32].
Harris suggested the intuitive notion that it is possible to map this case to a Markovian dynamics if the state of the system at time t is made to include the list of the ages of all events [33].The problem is that this conceptual approach is unworkable in practice due to the exorbitant size of the required information.By introducing an auxiliary age pyramid process, Ref. [34] identified some key components to add to the Hawkes process and its intensity to make the dynamics Markovian.Here, in order to map model (1) onto a Markovian stochastic process, we propose a more straightforward approach, which generalised the previous case of a single exponential memory function.We decompose the intensity into a sum of K excess intensities {z k } K k=1 as follows: Each excess intensity ẑk is the solution of a Langevin equation driven by a state-dependent Markovian Poisson shot noise Note that the same state-dependent Poisson noise ξP ν (t) defined by expression (6) acts on the Langevin equation for each excess intensity {ẑ k } k=1,...,K .In other words, each shock event impacts simultaneous the trajectories for all excess intensities {ẑ k } k=1,...,K (see the vertical broken line in Fig. 3a and 3b and the resulting trajectory of ν(t) in Fig. 3c).

Master equation
As the set of SDEs for ẑ := (ẑ 1 , ẑ2 , . . ., ẑK ) T are standard Markovian stochastic processes, we obtain the corresponding master equation: The jump-size vector is given by h := (n 1 /τ 1 , n 2 /τ 2 , . . ., n K /τ K ) T .The PDF P t (z) obeys the following boundary condition on the boundary ∂R K + := {z|z k = 0 for some k}.This equation ( 17) can be derived following the procedure used for the single exponential case that led us to the master equation (9) (see Appenedix A 1 for an explicit derivation).

Laplace representation of the master equation
The master equation (17) takes a simplified form under the Laplace representation, where the Laplace transformation in the K dimensional space is defined by with volume element dz := K k=1 dz k .The wave vector s := (s 1 , . . ., s K ) T is the conjugate of the excess intensity vector z := (z 1 , . . ., z K ) T .
The Laplace representation of the master equation ( 17) is then given by Then, the Laplace representation (19) of P t (z), which is the solution of (21), allows us to obtain the Laplace representation Qt (s) of the intensity PDF P t (ν) according to Qt (s) := L 1 [P t (ν); s] = e −s(ν0+ K k=1 ẑk ) = e −ν0s Pt s = (s, s, . . ., ) T .
D. General kernels

Mapping to Markovian dynamics
The above formulation can be generalized to general forms of the memory kernel.Let us decompose the kernel as a continuous superposition of exponential kernels, where we have introduced the set of continuous time scale τ ∈ R + := [0, ∞).With definition (2), the branching ratio is now given by Thus, the function n(τ ) quantifies the contribution of the τ -th exponential with memory length τ to the branching ratio.We can then interpret n(τ )/n as a normalised distribution of time scales present in the memory kernel of the Hawkes process.As we show below, an important condition for solvability will be the existence of its first-order moment This condition (25) means that n(τ ) should decay faster than 1/τ 2 at large τ 's.Hence, the representation (23) implies that the memory kernel has to decay at large times faster than 1/t 2 .This covers situations where the variance of the time scales embedded in the memory kernel diverges.But this excluded the cases h(t) ∼ 1/t 1+θ with 0 < θ < 1 that are relevant to the Omori law for earthquakes [16,17] and to the response to social shocks [20].This case 0 < θ < 1 for which α diverges needs to be treated separately and this is beyond the content of the present work.
We then decompose the intensity of the Hawkes process as a continuous sum of excess intensities ẑt (τ ) Each excess intensities ẑt (τ ) is the solution of a stochastic partial differential equation (SPDE) where, as for the previous case of a discrete sum of exponentials, the same state-dependent Poisson noise ξP ν (t) defined by expression (6) acts on the Langevin equation for each excess intensity ẑt (τ ).The set of SDE (27) expresses the fact that the continuous field of excess intensity {ẑ t (τ )} τ ∈R+ tends to relax to zero, but they are intermittently simultaneously shocked by the shared shot noise term ξP ν , with a τ -dependent jump size n(τ )/τ .

Field master equation
The master equation corresponding to the SDE ( 27) can be derived by following the same procedure presented for the simple exponential case and for the discrete sum of exponentials.There is however a technical difference since the state of the system is now specified by the continuous field variable {ẑ t (τ )} τ ∈R+ .Thus, the probability density function is replaced with the probability density functional P [{ẑ t (τ ) = z(τ )} τ ∈R+ ] = P t [{z(τ )} τ ∈R+ ].In other words, the probability that the system state is in the state specified by {z(τ )} τ ∈R+ at time t is characterized by P t [{z(τ )} τ ∈R+ ]Dz with functional integral volume element Dz.
The presence of a continuous field variable leads to several technical issues, such as in the correct application of the Laplace transform.The functional Laplace transformation L path of an arbitrary functional f [z] is defined by a functional integration (i.e., a path integral): This allows us to define the Laplace representation of the probability density functional by for an arbitrary nonnegative function {s(τ )} τ ∈R+ .
As the natural extension of Eq. ( 17), the master equation for the probability density functional is given by with the boundary condition where the boundary of the function space TABLE I: Examples of various memory kernel h(t) and corresponding n(τ ) defined in expression (23).

Laplace representation of the master equation
In the functional Laplace representation (29), the master equation ( 30) takes the following simple first-order functional differential equation

General formulation
All the above forms of the memory kernel can be unified by remarking that the variable transformation ( 23) is equivalent to a Laplace transform, since it can be rewritten as This allows us to reformulate the several examples discussed above in a unified way presented in Table I.

III. SOLUTION
In section II, we have derived the master equations and their Laplace representations for the Hawkes processes with arbitrary memory kernels.Remarkably, the Laplace representations are first-order partial (functional) differential equations.Because first-order partial (functional) differential equations can be formally solved by the method of characteristics (see Appendix C for a brief review), various analytical properties of the Hawkes process can be studied in details.
In this section, we present novel properties of the Hawkes process unearthed from the solution of the master equations by the method of characteristics.In particular, we focus on the behavior of the PDF of the steady-state intensity near the critical point n = 1.Under the condition of the existence of the first-order moment (25), an asymptotic analysis of the master equations shows that the PDF P ss (ν) := lim t→∞ P t (ν) exhibits a power-law behavior with a non-universal exponent: for large ν, up to an exponential truncation, which is pushed towards ν → ∞ as n → 1.As the tail exponent is smaller than 1, the steady-state PDF P ss (ν) is not renormalizable without the exponential cutoff.However, the characteristic scale of the exponential tail diverges as the system approaches the critical point n = 1, and the power-law tail (34) can be observed over many orders of magnitude of the intensity for near-critical systems, as we illustrate below.
The parameter α = n τ entering in the expression of tail exponent in expression (34) has been defined by expression (25).Since ν 0 is the background intensity of the Hawkes intensity as defined in (1), the exponent of P ss (ν) depends on ν 0 α = nν 0 τ , which is n times the average number of background events (or immigrants) occurring during a time equal to the average time scale τ of the memory kernel.Thus, the larger the memory τ , the larger the background intensity ν 0 and the larger the branching ratio n, the smaller is the exponent 1 − 2ν 0 α.Note that 1 − 2ν 0 α can even turn negative for ν 0 α > 1/2, which corresponds to a non-monotonous PDF P ss (ν), which first grows according to the power law (34) before decaying exponentially at very large ν's.
In simple terms, the PDF (34) describes the distribution of the number νdt of events in the limit of infinitely small time windows [t, t + dt].We should contrast this limit to the other previously studied limit of infinitely large and finite but very large time windows.Standard results of branching processes (of which the Hawkes model is a subset) give the total number of events generated by a given triggering event (see Ref. [21] for a detailed derivation).In equation (1), this corresponds to counting all the events over an infinitely large time window that are triggered by a single source event ν 0 = δ(t) occurring at the origin of time.Ref. [22] has studied the distribution of "seismic rates" in the limit of large time windows which, in our current formulation, corresponds to the distribution of N (t) := t+T t ν(τ )dτ , in the limit of large T 's.The corresponding probability density distributions are totally different from (34), which corresponds to the other limit T → 0.
We derive our main result (34) first for the single exponential form of the memory kernel, then for the discrete sum of exponentials and then for the general case.

A. Single exponential kernel
As the first example, we focus on the single exponential kernel (3).While this special case is analytically tractable without the need to refer to the master equation approach [29], we nevertheless derive its exact solution via the master equation approach, because the methodology will be readily generalized to the more complex cases.

Steady state solution
Let us first study the steady solution of the PDF P ss (ν).By setting K = 1 in Eq. ( 21), we obtain the expression of the Laplace transform of the steady state Pss (s) := ∞ 0 dνe −sν P ss (z) of the master equation ( 9) in the form of a first-order ordinary differential equation By solving this equation, we obtain the exact steady solution below the critical point n < 1, log Qss (s) = −sν 0 + log Pss (s) = − ν 0 τ s 0 sds e −ns/τ − 1 + s/τ (36) with the renormalization condition a. Near the critical point.Let us evaluate the asymptotic behavior of Qss (s) for large ν by assuming that the system is in the near-critical state, such that By performing an expansion in the small parameter ε, we obtain an asymptotic formula for small s (large ν), which implies a power-law behavior with a non-universal exponent, up to an exponential truncation: for large ν.This is a special case of expression (34) obtained for n → 1 and τ = τ .It is remarkable that the power-law exponent is less than one, and thus the PDF is not renormalizable without the exponential truncation.This means that the power-law scaling actually corresponds to an intermediate asymptotics of the PDF, according to the classification of Barenblatt [35].In addition, while this scaling can be regarded as a heavy "tail" for 2ν 0 τ < 1, the exponent can be negative when 2ν 0 τ > 1 (i.e., the PDF is a power-law increasing function until the exponential tail takes over and ensure the normalisation of the PDF).The characteristic scale of the exponential truncation is defined by which diverges as the system approaches to the critical condition n = 1.This means that (i) if the system is in a near-critical state ε ≪ 1 and (ii) the background intensity is sufficiently small ν 0 < 1/(2τ ), one can actually observe the power-law intermediate asymptotics for a wide range ν ≪ ν cut = O(ε −1 ), up to the exponential truncation.b.Numerical verification.We now present numerical confirmations of our theoretical prediction, in particular for the intermediate asymptotics as shown in Fig. 4.There is a well-developed literature on the numerical simulation of Hawkes process [36,37].We have used an established simulation package for Python called "tick" (version 0.6.0.0) for 32-threads parallel computing.The total simulation time was 10 4 seconds and the sampling time interval was 0.001 second.We note that the initial 10% of the sampled trajectories were discorded for initialization.
For small background intensity ν 0 ≪ 1/τ , we obtain an approximate universal exponent −1.For 2ν 0 τ < 1, we observe a decaying power law of exponent 1−2ν 0 τ , while we observe a growing power law for 2ν 0 τ > 1.The power-law intermediate asymptotics is truncated by the exponential function, as predicted and also discussed in Ref. [31] (albeit with the error of missing the 1 in the exponent and thus failing to describe the intermediate asymptotics), ensuring the normalization of the PDF of the Hawkes intensities.

Time-dependent solution
We now present the exact solution of the time-dependent master equation.In the Laplace representation, the dynamics of the PDF of the intensities is given by the following first-order PDE, This equation can be solved by the method of characteristics (see Appendix C for a brief review).The corresponding Lagrange-Charpit equations are given by These equations can be solved explicitly, C 1 and C 2 are constants of integration and s 0 is a positive constant chosen to satisfy several convenient properties discussed below.
a. Summary of the properties of F .We present several analytical properties of F (s) (see Appendix D for their proof): (α1) F (s) is a monotonically increasing function by choosing s 0 > 0 appropriately.(α2) The inverse function F −1 (s) can be defined uniquely.
In addition, for the sub-critical case n < 1, the following properties hold true: (α3) s 0 can be set to any positive value.
b. Regularization of F (s).In the following, we assume the sub-critical condition n < 1.It is useful to decompose F (s) into regular and singular parts: where the regular part is well-defined even for s 0 → 0. This expression is useful since the divergent factor in F (s) can be renormalized into the integral constant C 1 , such that We then take the formal limit s 0 → 0 and use the following regularized expression for the sub-critical condition n < 1, where the s 0 -dependence is removed as the result of the renormalization.The regular part has no singularity at s ≃ 0, and reads F R (s) ≃ −n 2 s/{2τ (1 − n)}.c.Explicit solution.Building on the above, we now provide the solution of the master equation (42).According to the method of characteristics (see Appendix.C for a brief review), the general solution is given by with an arbitrary function H(•).The time-dependent solution log Qt (s) = log Pt (s) − ν 0 s is thus given by log Qt (s The function H(•) is determined by the initial condition.
Let us assume that the initial PDF and its Laplace representation are given by P t=0 (ν) and Pt=0 (s), respectively.Then, we obtain or equivalently, Note that the time-dependent solution (50) is consistent with the steady solution ( 36) since lim x→+∞ H(x) = 0 (see Appendix.E 1 for the proof).We also note that, from the time-dependent solution (50), we can derive the dynamics of the intensity ν(t) for finite t as with the initial condition ν(0) = ν ini (see Appendix.E 2 for the derivation).This expression (54) shows that the mean intensity converges at long times t → +∞ to ν(t) → ν 0 /(1 − n), which is a well-known result [8,25].Expression (54) also shows that an initial impulse decays exponentially with a renormalised time decay τ /(1 − n), which is also consistent with previous reports [38].This diverging time scale τ /(1 − n), as n → 1, reflects the occurrence of all the generations of triggered events that renormalise the "bare" memory function into a "dressed" memory kernel with much longer memory.d.Asymptotic relaxation dynamics for large t.The time-dependent asymptotic solution is given for large t by log Pt (s) assuming that t ≫ F (s) for a given s.As a corollary of this formula, an asymptotic prediction for the distribution conditional on the initial intensity is given by the following formula for large t.Note that asymptotic convergence of these formula is not uniform in terms of s; indeed, the convergence of the Laplace representation for large s is slower than that for small s.

Another derivation of the power law exponent: linear stability analysis of the Lagrange-Charpit equation
We have presented both the steady state and time-dependent solutions of the master equations, based on exact or asymptotic methods.While these formulations are already clear, here we revisit the power-law bahavior (40) of the steady state PDF, and present another derivation based on the linear stability analysis of the Lagrange-Charpit equation, which has the advantage of being generalisable to memory kernels defined as superposition of exponential functions.Indeed, while the derivation based on the exact solution (36) is clear and powerful, it is not easy to extend this kind of calculation to general cases, such as superposition of exponential kernels.In contrast, the derivation that we now present can be extended to arbitrary forms of the memory kernel of the Hawkes processes, as will be shown later.Moreover, we have found additional distinct derivations of (40) and we refer the interested reader to Appendix.B.
While the steady state master equation ( 35) is a ordinary differential equation which can be solved exactly, let us consider its corresponding Lagrange-Charpit equations, where we introduce the parameter l of the characteristic curve.These equations can be regarded as describing a "dynamical system" in terms of the auxiliary "time" l.This formulation is useful because the well-developed theory of dynamical systems is applicable even to more general cases as shown later.a. Sub-critical condition n < 1.Let us first focus on the sub-critical case n < 1 and consider the expansion of equations (57) around s = 0, which leads to The corresponding flow of this effective dynamical system s(l) along the s axis as a function of "time" l is illustrated in Fig. 5. Near the critical condition ǫ = 1 − n ≪ 1, this "dynamical system" has two fixed points V (s) = 0 at The former is a stable attractor whereas the latter is an unstable repeller (see Fig. 5a).Remarkably, the critical condition n = 1 for the Hawkes process corresponds to the condition of a transcritical bifurcation (i.e., the repeller merges with the attractor; see Fig. 5b) for the "dynamical system" described by the Lagrange-Charpit equations.This picture is useful because it can be straightforwardly generalized to more general memory kernels h(t), as shown later.
Let us neglect the sub-leading contribution to obtain the general solution as with constants of integration l 0 and C. In the following, we set the initial "time" (i.e., the initial point on the characteristic curve) as l 0 = 0. We then obtain with constant of integration C.This constant is fixed by the condition of normalization of the PDF, given by log Pss = 0 for s = 0, which imposes C = 0. We thus obtain log Qss (s which is consistent with the asymptotic mean intensity in the steady state (see the long time limit of Eq. ( 54)).b.At criticality n = 1.For n = 1, the lowest-order contribution in the Lagrange-Charpit equation is given by with constant of integration l 0 .In the following, we set l 0 = 0 as the initial point on the characteristic curve.We then obtain log Pss = ν 0 τ dl s(l) with the constant of integration C. The constant is an "divergent" constant since it has to compensate the diverging logarithm to ensure that log P ss (s = 0) = 0.This "divergent" constant appears as a result of neglecting the ultra-violet (UV) cutoff for small s (which corresponds to neglecting the exponential tail of the PDF of intensities).By ignoring the divergent constant C, we obtain the intermediate asymptotics, which recovers the leading power law intermediate asymptotic (40), which is a special case of the general solution (34).

B. Double exponential kernel
We now consider the case where the memory function ( 14) is made of K = 2 exponential functions.Since the Laplace representation of the master equation is still a first-order partial differential equation, its solution can be formally obtained by the method of characteristics (see Appendix.C for a short review).Unfortunately, the timedependent Lagrange Charpit equation cannot be exactly solved in explicit form anymore.We therefore focus on the showing that s = 0 is a stable attractor.(b) Critical case with (τ1, τ2, n1, n2) = (1, 3, 0.3, 0.7), showing that the e1 direction is marginal in terms of the linear stability analysis (i.e., the repeller merges with the attractor, which corresponds to a transcritical bifurcation in dynamical systems).
steady state solution of the master equation, with a special focus on the regime close to the critical point.We develop the stability analysis of the Lagrange-Charpit equations following the same approach as in Sec.III A 3.
Let us start from the Lagrange-Charpit equations, which are given by and l is the auxiliary "time" parameterising the position on the characteristic curve.Let us develop the stability analysis around s = 0 (i.e. for large ν's) for this pseudo dynamical system.a. Sub-critical case n < 1.Assuming n := n 1 + n 2 < 1, let us first consider the linearized dynamics of system (66) as with Regarding this system as a dynamical system with the auxiliary "time" l, its qualitative dynamics can be illustrated by its phase space depicted in Fig. 6.In the subcritical case n < 1, the origin s = (0, 0) is "attractive" since all the eigenvalues of H are positive (Fig. 6a).
Let us introduce the eigenvalues λ 1 , λ 2 and eigenvectors e 1 , e 2 of H, such that Because all eigenvalues are real (see Appendix.F 1 for the proof), we denote λ 1 ≤ λ 2 .The determinant of H is given by This means that the zero eigenvalue λ 1 = 0 appears at the critical point n = 1.Below the critical point n < 1, all the eigenvalues are positive (λ 1 , λ 2 > 0).For n < 1, the dynamics can be rewritten as with constants of integration l 0 and C 1 .We can assume l 0 = 0 as the initial point of the characteristic curve without loss of generality.Integrating the second equation in (67), we obtain The general solution is given by with a function H determined by the initial condition on the characteristic curve.Let us introduce This means that the solution is given by the following form: Because of the renormalization of the PDF, the following relation must hold for any path in the (s 1 , s 2 ) space ending on the origin (limit s → 0).Let us consider the specific limit such that s1 → 0 with s2 = x −1 (s 1 ) λ2/λ1 for an arbitrary positive x: Since the left-hand side (LHS) is zero for any x, the function H(•) must be identically zero.With Φ := log Pss as defined in (66), this leads to log Pss (s) = −νKH −1 s.
By substituting with the special s = (s 1 = s, s 2 = s) T , we obtain log Qss (s) = −ν 0 s + Φ t=∞ s(1, 1) for small s, which recovers expression (62) derived above.b.Critical case n = 1.In this case, the eigenvalues and eigenvectors of H are given by This means that the eigenvalue matrix and its inverse matrix are respectively given by This value of α is the special case for two exponentials of the general definition (25).Accordingly, let us introduce We then obtain at the leading linear order in expansions in powers of x and y.Since the first linear term is zero in the dynamics of x, corresponding to a transcritical bifurcation for the Lagrange-Charpit equations (66), we need to take into account the second order term in x, namely where we have dropped terms of the order y 2 , xy and x 2 y.We then obtain the dynamical equations at the transcritical bifurcation (see Fig. 6b) to leading-order whose solutions are given by with constants of integration l 0 and C 1 .We can assume l 0 = 0 as the initial point on the characteristic curve.
Remarkably, only the contribution along the x axis is dominant for the large l limit (i.e., |x| ≫ |y| for l → ∞), which corresponds to the asymptotic limit s → 0. We then obtain with constant of integration C 2 .The general solution is given by with a function H, which is determined by the initial condition.Considering that the solution is given by the following form: Because we have neglected the UV cutoff for small s, there is an artificial divergent term −2ν 0 α log |x| for small x.
Except for this divergent term, Φ(s) must be constant for s → 0. The function with the choice of x = 2λ 2 α/ log(z/y) for any positive constant z.Therefore, we obtain the steady solution log Pss (s) for small x and y, by ignoring the UV cutoff and the constant contribution.This recovers the power law formula of the intermediate asymptotics of the PDF of the Hawkes intensities: log Qss (s) := −ν 0 s + log Pss (s, s) with α = τ 1 n 1 + τ 2 n 2 as defined in (81).c.Numerical verification.We have numerically confirmed our theoretical prediction (93), a special case of (34) for a memory kernel with two exponentials, as shown in Fig. 7.The main properties are the same as those shown in Fig. 4, implying that our prediction is verified for memory kernels with one and two exponentials.

C. Discrete superposition of exponential kernels
We now consider the case where the memory kernel is the sum of an arbitrary finite number K of exponentials according to expression (14).Our treatment follows the method presented for the case K = 2.
The corresponding Lagrange-Charpit equations read: The derivation of the PDF of the Hawkes intensities boils down to a stability analysis of these equations around s = 0 in the neighbourhood of the critical condition n = 1.a. Sub-critical case n < 1.We linearize the Lagrange-Charpit equations to obtain with Considering that all eigenvalues {λ k } k=1,...,K of H are real (see Appendix.F 1 for its proof), we order them according to λ i < λ j for i < j.We denote the corresponding eigenvectors as {e k } k=1,...,K .The matrix H can thus be diagonalised as follows The critical case n = 1 corresponds to the existence of a zero eigenvalue.Therefore, at the critical point, the determinant of H is zero (see Appendix.F 2 for the derivation of the explicit form of its determinant): Following calculations similar those presented in to Sec.III B, we obtain where the inverse matrix H −1 is explicitly given in Appendix.F 3. We finally obtain log Qss (s) = −ν 0 s + Φ(s(1, . . ., 1) again recovering (79) and (62) derived above.b.Critical case n = 1.At the critical point, the smallest eigenvalue of H is zero (λ 1 = 0).By direct substitution, its corresponding eigenvector is as seen from We now introduce a new set of variables (i.e., representation based on the eigenvectors) The linearized Lagrange-Charpit equations are given by Similarly to Eq. ( 86), the leading-order contribution comes from the x 1 direction because |x 1 | ≫ |x j | for j ≥ 2 in the asymptotic limit l → ∞.We thus neglect other contribution by assuming x j ∼ 0 for j ≥ 1.It is therefore necessary to take the second-order contribution along the x 1 direction, We note that x 1 is given by where α := K k=1 τ k n k , which is a special case for a discrete sum of exponentials of the general definition (25).Taking the derivative of (106) and using equation (94), we obtain This means that g 1 is a correct representation.Note that the value of α given by ( 25) ensures consistency with the following identify: where represents some unspecified value.Since the contribution of x 2 , . . .x K can be ignored for the description of the leading behavior along x 1 , let us set We thus obtain the second-order contribution along the x 1 axis by ignoring nonlinear contribution from x 2 , . . ., x K : With calculations that follow step by step those in Sec.III B, we obtain log Qss (s) This recovers the power law formula of the intermediate asymptotics of the PDF of the Hawkes intensities given by (34).

D. General case
We are now prepared to study the general case where the memory kernel of the Hawkes process is a continuous superposition of exponential functions (23).Introducing the steady state cumulant functional and from the master equation in its functional Laplace representation Eq. (32), we obtain the following first-order functional differential equation in the steady state, The corresponding Lagrange-Charpit equations are the following partial-integro equations, where l is the curvilinear parameter indexing the position along the characteristic curve.We now perform the stability analysis of this equation (114) in the neighbourhood of s = 0 close to the critical condition n = 1.a. Sub-critical case n < 1.We linearize the Lagrange-Charpit equation ( 114) to obtain with Let us introduce the eigenvalues λ ≥ λ min and eigenfunctions e(τ ; λ), satisfying Appendix G 1 shows that all the eigenvalues are real.The inverse matrix of H(τ, τ ′ ), denoted by H −1 (τ, τ ′ ), can be explicitly obtained as shown in Appendix G 2. Since the inverse matrix H −1 (τ, τ ′ ) has a singularity at n = 1, the critical condition of this Hawkes process is given by n = 1 as expected.Using calculations that are analogous to those in Sec.III B, we obtain from which we state that log Qss (s where 1(τ ) is an indicator function defined by 1(τ ) = 1 for any τ .This recovers (62), ( 79) and (100) derived above.
b. Critical case n = 1.At criticality, the smallest eigenvalue vanishes: λ min = 0. Indeed, we obtain the zero eigenfunction which can be checked by direct substitution: We now introduce a set of variables to obtain a new representation based on the eigenfunctions, with the inverse matrix e −1 (λ; τ ) satisfying We assume the existence of the inverse matrix, which is equivalent to the assumption that the set of all eigenfunctions is complete.H(τ, τ ′ ) can be diagonalized: We then obtain the linearized Lagrange-Charpit equations, The dominant contribution comes from the vanishing eigenvalue.We therefore focus on x(0) by setting x(λ) = 0 for λ > 0. We then form the expansion The explicit representation of x(0) is given by where α is defined by expression (25).Expression (127) can be checked to be valid by direct substitution since, from Eq. (114), we have showing that the first-order contribution is actually null in this representation.The parameter α (25) has the property to ensure the consistency with the following identity: Since we ignore the contribution from x(λ) except for λ = 0, let us set x(λ) = 0 for λ > 0, which yields where we have written x := x(0).We then obtain the second-order contribution along the x(0) axis, From calculations mimicking those in Sec.III B, we obtain log Qss (s) ≃ −2ν 0 α log |s| ⇐⇒ P (ν) This recovers the power law formula of the intermediate asymptotics of the PDF of the Hawkes intensities given by (34).

IV. CONCLUSION
We have presented an analytical framework of the Hawkes process for an arbitrary memory kernel, based on the master equation governing the behavior of auxiliary field variables.We have derived systematically the corresponding functional master equation for the auxiliary field variables.While the Hawkes point process is non-Markovian by construction, the introduction of auxiliary field variables provides a formulation in terms of linear stochastic partial differential equations that are Markovian.For the case of a memory kernel decaying as a single exponential, we presented the exact time-dependent and steady state solutions for the probability density function (PDF) of the Hawkes intensities, using the Laplace representation of the Master equation.For memory kernels represented as arbitrary sums of exponential (discrete and continuous sums), we derived the asymptotic solutions of the Lagrange-Charpit equations for the hyperbolic master equations in the Laplace representation in the steady state, close to the critical point n = 1 of the Hawkes process, where n is the branching ratio.Our theory predicts a power law scaling of the PDF of the intensities in an intermediate asymptotics regime, which crosses over to an asymptotic exponential function beyond a characteristics intensity that diverges as the critical condition is approached (n → 1).The exponent of the PDF is non-universal and a function of the background intensity ν 0 of the Hawkes intensity and of the parameter α = n τ , where τ is the first-order moment of the distribution of time scales of the memory function of the Hawkes process.We found that, the larger the memory τ , the larger the background intensity ν 0 and the larger the branching ratio n, the smaller is the exponent 1 − 2ν 0 α of the PDF of Hawkes intensities.
This work provides the basic analytical tools to analyse Hawkes processes from a different angle than hitherto developed and will be useful to study more general and complex models derived from the Hawkes process.For instance, it is straightforward to extend our treatment to the case where each event has a mark quantifying its impact or "fertility", thus defining the more general Hawkes process with intensity ν(t) = ν 0 + N (t) i=1 ρi h(t − ti ) with independent and identically distributed random numbers { ρi } i .Our framework is also well-suited to nonlinear generalisations of the Hawkes process, for instance with the intensity taking the form ν(t) = g(ω(t)) > 0, where the auxiliary variable ω is given by ω(t) = ω 0 + N(t) i=1 ρi h(t − ti ) and where the times { ti } i of the events are determined from the intensity ν(t).In this nonlinear version, the positivity of ω(t) and ρ i are not anymore required.This nonlinear Hawkes process is more complex than the linear Hawkes process but our framework can be applied to derive its most important analytical properties [40].We note that such nonlinear Hawkes process include several models that have been proposed in the past, with applications to explain the multifractal properties of earthquake seismicity [41] and of financial volatility [42].
Given the dynamical equations ( 16) for the excess intensities {ẑ k , k = 1, ..., K}, which are short hand notations for the dynamics given by equation (8), for an arbitrary function f (ẑ), its stochastic time evolution therefore reads (A1) with jump size vector h and Hawkes intensity ν, defined by Taking the ensemble average of both sides of (A1) and after partial integration of the left-hand side, we get After partial integration of the right-hand side and making the change of variable z + h → z, we obtain Since this is an identify for an arbitrary f (z), we obtain Eq. ( 17).We derive the corresponding Laplace representation (21) as follows: Let us apply the Laplace transform to both sides of Eq. ( 17), The left-hand side is given by For the right-hand side, let us consider the following two relations: where we have used the partial integration on the second line and have used the boundary condition (18) on the third line, and where we have applied the change of variable z − h → z on the second line.By applying these two relations to the right-hand side of Eq. (A5), we obtain Eq. ( 21)

Kramers-Moyal apparoch
We can derive relation (40) using the Kramers-Moyal expansion of the master equation ( 9).Let us consider the expansion By truncating the series at the second order, we obtain the Fokker-Planck equation at the critical point n = 1 in the steady state: for z → ∞.We thus obtain an asymptotic formula This solution is consistent with the truncation of the Kramers-Moyal series at the second order, which consists in removing negligible higher order terms.For large l ≥ 3, indeed, we obtain Appendix C: Elementary summary of the method of characteristics The method of characteristics is a standard method to solve first-order PDEs.Here we focus on linear first-order PDEs that are relevant to the derivation of the PDF of Hawkes intensities.Let us consider the following PDE: a(x, y, z) ∂z(x, y) ∂x + b(x, y, z) ∂z(x, y) ∂y = c(x, y, z).

(C1)
According to the method of characteristics, we consider the corresponding Lagrange-Charpit equations: with the parameter l encoding the position along the characteristic curves.These equations are equivalent to an invariant form in terms of l dx a(x, y, z) Let us write their formal solutions as C 1 = F 1 (x, y, z) and C 2 = F 2 (x, y, z) with constants of integration C 1 and C 2 .
The general solution of the original PDE (C1) is given by with an arbitrary function φ(C 1 , C 2 ), which is determined by the initial or boundary condition of the PDE (C1).This method can be readily generalized to systems with many variables.
This means that the argument x(t, s) = t − F (s) shows the divergence From Eq. (E4), the inverse function shows the asymptotic behavior for large x: By substituting the relation (E5), we obtain We now assume the initial condition ν(0) = ν ini , or equivalently log Pt=0 (s) = −ν ini s.From Eq. (50), we thus obtain the relaxation dynamics for the tail s ≃ 0, which means that the average of ν(t) is given by ν 3. Derivation of the asymptotic formula (55) Let us derive the asymptotic relaxation formula (55) for sufficiently large t, satisfying t ≫ F (s) (E10) for a given s.Under such a condition, the asymptotic relation for the inverse function F −1 (−x(t, s)) is available as Eq.(E6) with x(t, s) = t − F (s).We then obtain for sufficiently large t.By substituting this into Eq.(50), we obtain Eq. (55).
Appendix F: Proofs of mathematical properties of H (96) Here, we summarize the proofs of the main mathematical properties of H (96) for arbitrary values of K.
1. Proof of that its eigenvalues are real All eigenvalues of H are real numbers for the following reasons.H can be symmetrized as H, defined by Indeed, by representing all the matrices by their elements H := ( Hij ), H := (H ij ), and We therefore obtain implying that any eigenvalue of H is the same as that of H.Because H is a symmetric matrix, all the eigenvalues of H are real.Therefore, all the eigenvalues of H are also real.

Determinant
Here, we derive the determinant det H for arbitrary values of K. Let us recall the following identities, showing the invariance of determinants: which implies or equivalently in the representation by matrix elements.As a check of the above calculation, we can directly confirm the following relation, defining the inverse matrix: The inverse matrix has a singularity at n = 1, which corresponds to the critical regime of the Hawkes process.

Inverse matrix
The inverse matrix of H(τ, τ ′ ) is given by Indeed, we verify that The inverse matrix has a singularity at n = 1, corresponding to the critical regime of the Hawkes process.

FIG. 4 :
FIG.4: Numerical evaluation of the steady state PDF of the intensity ν for the following parameter sets near the critical point: n = 0.999 (blue bars) and n = 0.99 (red bars).The theoretical power law is shown by the green straight line.(a) Background intensity ν0 = 0.01, relaxation time τ = 1, leading to the power-law exponent 0.98.(b) ν0 = 0.2, τ = 1, leading to the power-law exponent 0.6.(c) ν0 = 1.0, τ = 1, leading to the negative (i.e.growing) power-law exponent −1.0.For all simulations, the sampling time interval and total sampling time are dt = 0.001 and Ttot = 10000 with the initial condition ẑ(0) = 0.The initial 10% of the sample trajectories were discarded from the statistics.

−n 1
constants {n k } k .Using these relations, the determinant of H is given by det H = det /τ 1 + 1/τ