The statistical physics of discovering exogenous and endogenous factors in a chain of events

Event occurrence is not only subject to the environmental changes, but is also facilitated by the events that have occurred in a system. Here, we develop a method for estimating such extrinsic and intrinsic factors from a single series of event-occurrence times. The analysis is performed using a model that combines the inhomogeneous Poisson process and the Hawkes process, which represent exogenous fluctuations and endogenous chain-reaction mechanisms, respectively. The model is fit to a given dataset by minimizing the free energy, for which statistical physics and a path-integral method are utilized. Because the process of event occurrence is stochastic, parameter estimation is inevitably accompanied by errors, and it can ultimately occur that exogenous and endogenous factors cannot be captured even with the best estimator. We obtained four regimes categorized according to whether respective factors are detected. By applying the analytical method to real time series of debate in a social-networking service, we have observed that the estimated exogenous and endogenous factors are close to the first comments and the follow-up comments, respectively. This method is general and applicable to a variety of data, and we have provided an application program, by which anyone can analyze any series of event times.


I. INTRODUCTION
It is widely seen that events occur one after another in a chain by self-excitation mechanisms, as in the communication of diseases [1][2][3][4][5], crimes and conflict [6][7][8][9], seismic dynamics [10,11], scientometrics [12], tweets [13][14][15], financial events [16][17][18][19], and neuronal firing [20][21][22][23]. Generally, event occurrence is subject not only to such intrinsic self-excitation mechanisms, but also environmental changes that take place independently of events that have occurred in a system. For instance, contagious diseases may spread from one individual to another, but the occurrence rate may also be subject to environmental factors such as temperature change [24]. When considering how to facilitate or attenuate occurrence activity, we need to know the potential contributions of endogenous and exogenous factors [25][26][27]. When the government reviews possible intervention in the economy, it must be able to project the degree of the uncontrollable chain reaction causing economic fluctuations such as falling stock prices, and hopefully to calculate the possible impact of external intervention.
It has been suggested that the temporal profile of changes in the tweeting rate may provide information as to whether such changes have been caused endogenously or exogenously [13], but opinions have been divided over the issue as for whether gross activity contains enough information. Several works have addressed the estimation problem by analyzing precise time series of event occurrence using machine-learning algorithms [14,15,[28][29][30][31][32][33][34].
Recently, we have developed an analytical method based on a generalized linear model (GLM) equipped with a self-exciting mechanism and analyzed a real time series of tweets to determine the contributions of endogenous and exogenous factors [35]. One problem with the GLM is its strong nonlinearity, such that previous events multiply each other and extrinsic and intrinsic factors are mixed up as a result.
In the present study, we consider analyzing a series of occurrence times via the "inhomogeneous Hawkes process," a linear superposition of the inhomogeneous Poisson and Hawkes processes [36], which represent exogenous fluctuations and endogenous chain-reaction mechanisms, respectively. We devise the "Hawkes decoder," a method of fitting the inhomogeneous Hawkes process to a given dataset. We also develop a path-integral method for computing free energy analytically. Given an eventgeneration process, the path-integral method can determine the parameters of the model representing the degree of an internal chain reaction and the impact of environmental changes.
This estimation is inevitably accompanied by errors, because the event-generation process itself is stochastic. The estimation errors can be assessed by applying the decoder to synthetic data generated by simulating the inhomogeneous Hawkes process and comparing the estimated and original parameters. It may occur that exogenous fluctuation and/or endogenous self-excitation are not captured, even by a Hawkes decoder that can represent the original process. Using a path-integral method, we construct phase diagrams in which the results are categorized into four qualitatively different regimes according to whether or not their respective factors are detected. Then, we apply the Hawkes decoder to a time series of debate in a social-networking service (SNS) to estimate the relative contributions of exogenous and endogenous factors. We also examine the model's capability of predicting the amount of follow-up comments induced by first comments in the SNS. We also provide an application program, by which anyone can analyze any sequence of event times.

II. METHODS
A. Generating a series of events with the inhomogeneous Hawkes process We define the inhomogeneous Hawkes process in terms of the intensity or the occurrence rate λ(t) given by where the first term ν(t) on the right-hand-side represents the inhomogeneous Poisson process such that events are derived from a time-varying occurrence rate given exogenously, whereas the second term represents a selfexcitation effect in terms of the Hawkes process such that the occurrence of each event modifies the probability of future events endogenously ( Fig. 1 (a)). Here, α is the reproduction ratio representing the average number of events induced by a single event, t i is the occurrence time of a past (ith) event, and φ(t) is a kernel representing the time course of the self-excitation, satisfying the causality φ(t) = 0 for t < 0 and the normalization ∞ 0 φ(t)dt = 1. A basic procedure for putting the rate-model process into practice is to divide the time axis into small time intervals of δt, and then to repeat the Bernoulli process in which an event is either present (with a probability of λ(t)δt ≪ 1) or absent (with a probability of 1 − λ(t)δt) at each time bin ( Fig. 1(b)). The probability of having no event for the first N intervals and finally having an event at the N + 1st interval is given by N i=1 (1 − λ(iδt)δt)λ(nδt)δt. By taking the limit of the infinitesimal interval, the probability density of having an inter-event interval t ≡ nδt is given by (2) Accordingly, the probability that events occur at times {t i } ≡ {t 1 , . . . , t n } in a period [0, T ] is obtained as [37,38] B. The Hawkes decoder: a method of estimating endogenous and exogenous factors We wish to estimate the exogenous stimulus ν(t) and the degree of self-excitation α that have contributed to Events System Endogenous self-excitation Exogenous stimulus Here, we assume that the external input ν(t) is modulated slowly in time. This may be represented by a prior distribution that penalizes the large gradient: where γ is a hyperparameter that specifies the smoothness or flatness of ν(t) and Z(γ) is the normalization constant where D{ν(t)} represents functional integration over all possible paths of external inputs ν(t). Given a time se-ries of events {t i }, the posterior distribution of {ν(t)} is obtained through Bayes' theorem as where p α ({t i }|{ν(t)}) is the conditional probability that events occur at times {t i }, which is given by Eq. (3). Here we have denoted the self-exciting parameter α and the external input ν(t) explicitly as conditions, because they are used to specify the underlying rate λ(t) in Eq. (1). The denominator p α,γ ({t i }) is the marginal likelihood obtained by the marginalization integral: The parameter α and hyperparameter γ are determined by maximizing the marginal likelihood: With the selected parameter and hyperparameter, the likely external input is obtained as the maximum a posteriori (MAP) estimate, such that the posterior distribution (6) is maximized: With the estimatedα, {ν(t)}, and the given series of occurrence times {t i }, we can estimate the underlying rate asλ The estimated exogenous rate {ν(t)} depends largely upon the selected hyperparameterγ. With a largeγ, the estimated rate fluctuates largely in time. If the exogenous fluctuation is small, it may occur that the estimatedγ vanishes, in which case the estimated rateν(t) becomes constant. Another parameterα represents the estimated level of self-excitation, and it might also occur that the reproduction ratioα vanishes. Thus, there are four regimes categorized according to the combination of whetherγ = 0 (homogeneous) orγ = 0 (inhomogeneous) and whetherα = 0 (self-excitation undetected) orα = 0 (self-exciting detected). The nicknames of the four regimes are given in Table I In the previous section, we showed the procedure for fitting the inhomogeneous Hawkes process to a given series of occurrence times. Nevertheless, the parameter andαγ interpretation 0 0 "Poisson" 0 finite "Exo" finite 0 "Endo" finite finite "Exo+Endo" TABLE I. Four regimes categorized according to the selected parameter α and hyperparameter γ. "Poisson": Neither endogenous self-excitation nor exogenous fluctuation is detected, {α = 0,γ = 0}, and accordingly the process is considered to be close to the homogeneous Poisson process. "Exo": Only temporal variation of the exogenous origin is detected, {α = 0,γ = 0}, and the process is close to the inhomogeneous Poisson process. "Endo": Only endogenous self-excitation is detected, {α = 0,γ = 0}. "Exo+Endo": Both exogenous and endogenous factors are detected, {α = 0,γ = 0}.
hyperparameter {α,γ} can be determined even without a series of events, if the event-generating process is given. In this section, we perform this estimation analytically using the path-integral method.
The marginalization in Eq. (7) can be represented as a path-integral: a classical orbit that satisfies the Euler-Lagrange equation is the MAP solution {ν(t)}, given by Eq. (9). The path-integral in Eq. (11) can be performed analytically by expanding the action functional up to the quadratic order in the deviation from the classical orbit: where δ 2 Sν (t) [δν(t)] denotes an action integral of the deviation around the classical orbitν(t), defined as The marginal likelihood is obtained as where represents a "quantum effect." The "free energy" is the negative logarithm of the marginal likelihood: By decomposing the original exogenous input into a mean and fluctuation and expanding the action integral up to the quadratic order of the fluctuation, the free energy for a long series of events can be obtained analytically in terms of the mean input and the autocorrelation of fluctuation. The derivation of the free energy and the obtained formula are summarized in Appendix B.

Event-generation model
We shall examine the Hawkes decoder by applying it to a series of events generated by the inhomogeneous Hawkes process of a given reproduction ratio α * and exogenous input {ν * (t)}: where the self-excitation kernel is given as φ * (t) = 1/τ * s exp(−t/τ * s ) for t > 0 and φ * (t) = 0, otherwise. In particular, we examine the case where ν * (t) is fluctuating according to the Ornstein-Uhlenbeck process (OUP) (Fig. 2): where ξ(t) is a Gaussian white noise satisfying ξ(t)ξ(t + u) = δ(u). Accordingly, the OUP is characterized by an autocorrelation function with a time constant τ * e : The inhomogeneous Poisson process with the fluctuating rate Eq. (19) is also called the doubly stochastic Poisson process. Thus, the current event-generation is a linear combination of the Hawkes process in Eq. (18) and the doubly stochastic Poisson process characterized by the autocorrelation Eq. (20). The important dimensionless parameters of the model are the strength of endogenous self-excitation α * and the exogenous fluctuation σ * 2 τ * e /µ * .

The Hawkes decoder with a correct self-excitation kernel
First, we analyze the synthetic data using the Hawkes decoder. We consider here the case in which the model Exogenous input ν * (t) generated by the Ornstein-Uhlenbeck process (OUP) of given parameters µ * , σ * , and τ * e .
The y-axis (α * = 0) in Fig. 3 (a) represents the situation by which events are generated by the inhomogeneous Poisson process in which the rate ν * (t) fluctuates with a standard deviation σ * around the mean rate µ * in a timescale of τ * e . Because the event-generation process is stochastic in nature, the profile of the underlying rate ν * (t) cannot be captured precisely from a series of occurrence times, and it might even occur that the model abandons to capture a fluctuation if the amplitude of the original rate fluctuation is too small. Thisγ = 0 occurs if estimating a fluctuating rate (with γ = 0) is likely to produces a larger error than indicating a constant rate (with γ = 0). We found that the model suggests a constant rate if the original rate fluctuation is small (Appendix C 1): We have considered minimizing the mean integrated squared error between the underlying rate and the histogram of a given series of event times and discovered that the optimal bin size may diverge when the underlying fluctuation is small [39]. For the doubly stochastic Poisson process Eq. (19), the condition in which the bin size diverges is identical to inequality (21) with α * = 0. This implies that the condition for the rate fluctuation FIG. 3. Phase diagrams of the "Hawkes decoder" and the "Bayesian rate estimator" for the doubly stochastic inhomogeneous Hawkes process characterized by an amplitude of the self-excitation α * and exogenous fluctuation σ * 2 τ * e /µ * . (a) The Hawkes decoder. The parameters α and γ are selected by minimizing the free energy F (α, γ). The "Poisson," "Endo," and "Exo+Endo" regimes are defined by {α = 0,γ = 0}, {α = 0,γ = 0}, and {α = 0,γ = 0}, respectively. The phase boundaries are computed for the case of τ * e ≫ τ * s . (b) The sample event series, the original rate λ * (t), and the rateλ(t) estimated by the Hawkes decoder. (c) The Bayesian rate estimator. The parameter α is chosen to be 0 and the hyperparameter γ is selected by minimizing the free energy Fp(γ) = F (α = 0, γ). The "Exo" regime is defined by {α = 0,γ = 0}. (d) The rateν(t) was estimated by the Bayesian rate estimator for the same event series analyzed in (b).   being unknowable is independent of the estimation methods.

The Bayesian rate estimator
Now we analyze the identical dataset using the "Bayesian rate estimator," which estimates the fluctuating rate ν(t) by fitting the inhomogeneous Poisson process to the data. In this case, we compute the marginal likelihood by setting α = 0 in Eq. (7), and obtain the free energy as In this case, the process of estimating {ν(t)} is identical to the rate estimation method that we have developed previously [40]. Figure 3 (c) depicts a phase diagram of the Bayesian rate estimator for the same data examined with the Hawkes decoder in Fig. 3 (a). Here we obtain two different regimes categorized according to whetherγ = 0 ("Poisson") orγ = 0 ("Exo").
The x-axis of the diagram (σ * 2 τ * e /µ * = 0) in Fig. 3 (c) represents the data generated by the (homogeneous) Hawkes process. Even if the exogenous fluctuation is absent, an apparent occurrence of events may exhibit the large fluctuation due to the self-excitation mechanism. We found that the Bayesian rate estimator may suggest a fluctuating rateν(t) orγ = 0 if the original self-excitation is greater than the critical value (Appendix C 2): The critical reproduction ratio α c is identical to that obtained for the principled histogram method, such that the selected bin size diverges above this reproduction ratio α c [41,42]. As shown in Fig. 3 (c), the Bayesian rate estimator cannot capture the rate fluctuation if the endogenous selfexcitation α * or exogenous fluctuation σ * 2 τ * e /µ * is small. In Fig. 3 (d), we also show the ratesν(t) estimated by the Bayesian rate estimator for the same series of events analyzed by the Hawkes decoder in Fig. 3 (b).

The Hawkes decoder with an incorrect self-excitation kernel
Even if event occurrence is facilitated by the selfexcitation mechanism, the self-excitation information is generally not available. We consider the case in which the Hawkes decoder assumes a kernel that is different from that of the original process. Here in particular, we analyze the case in which the timescale τ s of the self-exciting kernel φ(t) = 1/τ s exp(−t/τ s ) is different from the timescale τ * s of the original kernel φ * (t) = 1/τ * s exp(−t/τ * s ), and see how the phase diagram is modified by choosing an incorrect self-excitation kernel (here we examine the case of τ * e ≫ τ * s ). Figure 4 (c) represents the phase diagram obtained with a correct kernel, τ s = τ * s . If the timescale of the decoder's kernel τ s is longer than the original, τ s > τ * s , the "Exo+Endo" regime reduces and the "Exo" regime emerges from the right-hand side of the phase diagram ( Fig. 4 (b)). If the timescale of the decoder's kernel is even longer, the "Exo" regime dominates, and the selfexcitation is not detected. In the limit of τ s ≫ τ * s , its phase diagram is identical to that of the Bayesian rate estimator (Fig. 4 (a)).
In the case where the timescale of the decoder's kernel is shorter than the original, τ s < τ * s , the Hawkes regime expands (Fig. 4 (d)) from the regime obtained with a correct kernel (Fig. 4 (c)). In this case, the reproduction ratio α is estimated to be smaller than the original (α < α * ). In the limit of τ s ≪ τ * s , the estimated reproduction ratio becomes infinitesimal, and accordingly the "Endo" and "Exo+Endo" regimes turn into "Poisson" and "Exo" regimes, respectively (Appendix C 3). In this limit, the phase diagram is also identical to that of the Bayesian rate estimator (Fig. 4 (e)).

The case of slow self-excitation
So far we have examined the cases in which an environmental factor changes slowly compared to the selfexcitation (τ * e ≫ τ * s ), and shown that the Hawkes decoder can estimate the contributions of exogenous and endogenous factors (Fig. 5 (a)). If the timescale of exogenous fluctuation τ * e is comparable or even shorter than that of self-excitation τ * s , however, it is difficult to separate the exogenous and endogenous contributions to the data.
If the timescale of the external fluctuation τ * e is relatively close to that of the self-exciation τ * s , the "Exo+Endo" regime decreases ( Fig. 5 (b)). If the timescale of the extrinsic fluctuation is comparable to that of self-excitation τ * e = τ * s , we may not discriminate between the exogenous and endogenous self-excitation components and obtain only the "Endo" regime ( Fig. 5  (c)). Contrary, if the timescale of the external fluctuation τ * e is shorter than that of the self-exciation τ * s , the decoder may yield the "Exo" regime (Figs. 5 (d) and (e)). Figure 6 depicts the full view of the transition of the phase diagram, consisting of a variety of situations.

Time series of comments submitted to an SNS
Finally, we analyze real-world data, namely the time series of comments talking about a given subject on a particular Reddit forum, either in original posts or in comments upon them. Data were collected through the public API for the keyword "coronavirus" in the subreddit "r/worldnews" for a period between January 19th    and February 19th, 2020. The dataset contains 223,545 comments in response to 5,341 submissions. Figure 7 (a) displays the changes in commenting activity over a month, indicating a strong burst of comments occurring at many times, presumably induced by the news. When considering topic-related content, the first comments on original posts might be interpreted as a direct consequence of exogenous influences, because they were likely to be induced by real-world events. The other follow-up comments may be considered as endogenous activity that was induced within a forum.
We applied the Hawkes decoder to the time series of times of all the comments (whether first comments or follow-up comments) to estimate the exogenous and endogenous factors that have worked to generate events. Due to the large size of the observation window, we selected several 1-week datasets from the full time series. When applying to the Empirical Bayes method, we also selected the timescale of the self-exciting kernel so that the likelihood is maximized. By analyzing several 1-week time series, the timescale of the kernel was selected in a range between 1,300 and 3,000 sec, implying that commenting may typically be done in about half an hour. This is in contrast to our previous GLM analysis of the tweets that contain a hashtag related to "bitcoin," in which case we used a kernel of the timescale 60 seconds or 1 min. This short response-time is presumably because clicking the retweet button can be done quickly, in contrast to the submission in Reddit, for which one needs some time to organize a comment.
The results of the decoding analysis for two segments are shown in Fig. 7 (b). The rates of total submissions and the original posts and direct comments were shown using 30-min binning. The exogenous rate estimated by the Hawkes decoder was superimposed upon the figure.
Here we plotted the 95% range of ν(t) estimated from the posterior distribution pα ,γ ({ν(t)}|{t i }) computed by Eq. (6) with the parameterα and hyperparameterγ determined by Eq. (8). We observe that the estimated exogenous component ν(t) exhibits good agreement with the rate of first comments. The reproduction ratioα was estimated as 0.84 and 0.81, and the timescale τ was selected as 1,644 and 2,456 sec for these two segments.

Predicting chain-reactions induced by external influence
While the Hawkes decoder can discover exogenous and endogenous factors in a single event series, its validity cannot be proven rigorously as long as the information about these origins is unavailable. However, it may be possible to use a model for predicting chain-reactions that are indirectly induced by environmental changes.
We analyzed the commenting data in the Reddit. By considering the occurrence rate of first comments as the external influence ν(t), we simulated the Hawkes process to estimate the entire occurrence rate λ(t). To do this, we estimated the reproduction ratio α and the timescale of the self-excitation kernel τ s by fitting the Hawkes decoder to the data of the preceding one week. With the series of events generated by this procedure, we constructed a time histogram of the occurrence rate with a bin size of 30 min. By repeating this procedure 1,000 times for the same ν(t), we obtained 1,000 different time histograms, with which we can obtain the distribution of the number of comments. Figure 7 (c) depicts two examples of commenting activity for 1 week, with the distribution of occurrence rates λ(t) predicted from the first comments. The rate of all comments that have occurred in practice is plotted on the distribution, exhibiting good predictive performance.

IV. DISCUSSION
The Hawkes process has attracted attention as a mathematical model that may describe the self-excitation mechanisms generating a chain of events, and there have been many attempts to fit the model to given event times.
Secondly, potential environmental changes were taken into account by introducing temporal fluctuation to the background rate. There are several attempts to estimate the background rate nonparametrically, including an Expectation-Maximization (EM) algorithm [49], kernel-density estimation and differential-entropy minimization [59], and local-maximum-likelihood estimation [60]. There have also been methods for estimating the background rate parametrically [27,61,62].
Our analytical method can be categorized into the latter group of studies; we have developed a method of estimating both exogenous and endogenous factors by fitting a linear superposition of the inhomogeneous Poisson process and the Hawkes process to an observed sequence of event times. With synthetic data generated by the inhomogeneous Hawkes process, we have confirmed that the method works properly for estimating the original parameters.
Here in particular, we have found that there are cases in which even the best decoding method cannot capture the extrinsic fluctuation and/or intrinsic self-excitation. Regarding the extrinsic fluctuation, a principled estimator may assume a constant environment if the extrinsic fluctuation is too small: even though the decoding method can estimate the fluctuating rate from a given dataset, the estimation error may become larger than that obtained by assuming a constant rate. Regarding the intrinsic self-excitation, the model cannot separate the self-excitation from environmental fluctuation if the timescale of excitation is similar to or larger than that of environmental fluctuation. We have summarized the undetectability conditions up into phase diagrams categorizing four regimes according to whether or not the exogenous fluctuation and endogenous self-excitation are respectively detected.
We also devised the Hawkes decoder to estimate the exogenous and endogenous factors from a given series of events. By applying it to real time series of debate on Reddit, we have observed that the first comments and the follow-up comments map closely to the estimated exogenous and endogenous reactions, respectively.
While the Hawkes decoder can estimate the contributions of exogenous and endogenous factors in a single event series, it is often the case that information about the origin is unavailable. In such a case, the action of dividing the underlying causes into exogenous and endogenous categories might be regarded as a subjective interpretation. However, the estimation of respective contributions might be useful if it correctly predicts the impact of external factors upon the resulting occurrence. For the real time series of debate on Reddit, we considered the occurrence rate of first comments as the exogenous influence, and simulated the Hawkes process to "predict" the number of follow-up comments. We confirmed that the prediction exhibited a good agreement with the number of follow-up comments that occur in practice.
Our method is general and applicable to a variety of data. We have provided an application program, with which anyone can analyze any series of event times.

Appendix A: Numerical method
We reformulate the Hawkes decoder model, defined by Eqs. (3) and (4), as a state-space model, based on which a numerical method for selecting the parameter and hyperparameter {α,γ} and a time-dependent exogenous stimulusν(t) are developed.

State-space representation
We use the exponential function φ(t) = (1/τ s ) exp(−t/τ s ) for the self-excitation kernel. From Eqs. (1) and (2), the probability density of the inter-event interval t i − t i−t is expressed as where ν + (t) = max(0, ν(t)), which ensures nonnegativity of the exogenous stimulus, and R i is computed for i = 2, . . . , n as with an initial value R 1 . We approximate the exogenous stimulus as being piecewise constant, which is valid under the condition that the timescale of the exogenous fluctuation is sufficiently large enough compared with the mean inter-event interval. Under this approximation, and letting y i = t i − t i−1 be the ith interevent interval, Eq. (A1) becomes ), (A4) which we consider the conditional density of y i given ν i .
The transition density of ν i associated with the prior distribution (4) is given by (A5) Combined with an initial density p(ν 1 ), Eqs. (A4) and (A5) define a state-space model [63,64], for which the empirical Bayes method can be implemented by the recursive Bayesian algorithm.

Recursive Bayesian algorithm
For notational simplicity, let Y i = {y 1 , . . . , y i } be the observations up to time t i . By the Bayes' theorem, the posterior distribution of ν i , given the observations up to the current time, is expressed as where p α,γ (ν i |Y i−1 ) on the right-hand-side is computed using the posterior distribution from the last iteration, p α,γ (ν i−1 |Y i−1 ), as (A7) Thus, starting with the initial distribution p α,γ (ν 1 |Y 0 ) = p(ν 1 ), the posterior distributions (A6) and (A7) are recursively computed for i = 1, . . . , n. Once we obtain these distributions, the posterior distribution of ν i , given the whole observation Y n , is computed using the following recursive equation, for i = n − 1, . . . , 1 in backward. We obtain the MAP estimate of the exogenous stimulus {ν i }, such that p α,γ (ν i |Y n ) (for i = 1, . . . , n) is maximized.
For the state-space model (A4) and (A5), we introduce a Gaussian approximation in the posterior distribution (A6) at each point in time, providing a simple algorithm that is computationally tractable [65][66][67]. Let ν i−1|i−1 and q i−1|i−1 be the (approximate) mean and variance for p α,γ (ν i−1 |Y i−1 ). Under a Gaussian approximation in p α,γ (ν i−1 |Y i−1 ), the posterior distribution (A7) is also a Gaussian whose mean and variance are, respectively, computed as To make a Gaussian approximation in Eq. (A6), let l(ν i ) = log p α (T i |ν i )p α,γ (ν i |T 1:i−1 ) be the log posterior distribution (from which we omit the normalization constant). Expanding the log posterior distribution about its maximum point up to the second-order yields l(ν i ) ≈ l(ν i|i ) +l(ν i|i )(ν i − ν i|i ) 2 /2, where ν i|i is obtained as a root of the equationl(ν i ) = 0, Thus the posterior distribution (A6) is approximated to a Gaussian with mean ν i|i and variance: The Gaussian approximations in p α,γ (ν i |Y i ) and p α,γ (ν i |Y i−1 ) result in a Gaussian for Eq. (A8) as well. Let ν i|n and q i|n be the mean and variance of p α,γ (ν i |Y n ). We then obtain the recursive equation corresponding to Eq. (A8): Eqs. (A9)-(A14) comprise the recursive estimation of {ν i }. The algorithm is summarized as follows.
1. Set initial values ν 1|0 , q 1|0 and R 1 . The resulting {ν i|n } provides the MAP estimate of the exogenous rate. Note that we may use a diffuse (noninformative) prior for the initial values (q 1|0 → ∞), which results in ν 1|1 = 1/y 1 − α/τ s R 1 and q 1|1 = 1/y 2 1 , leaving out the dependency of the initial values upon the estimation. In our analysis, we used R 1 = τ s /y, where y = m i=1 y i /m is an average of the observations over a given range (we have chosen m = 100), to remove the initial non-stationary part of the estimation.
To select the parameter and hyperparameter {α,γ}, we consider a factorization of the marginal likelihood, Since p α,γ (ν i |Y i−1 ) on the right-hand-side is approximated by a Gaussian with mean ν i|i−1 and variance q i|i−1 , the integral may be approximated by the Gauss-Hermite quadrature, where the weights {w l } and evaluation points {x l } are chosen according to a quadrature rule [68]. The parameter and hyperparameter {α,γ} are selected by maximizing Eq. (A15) numerically. It should be noted that the time constant of the self-excitation kernel,τ s , may also be selected by maximizing Eq. (A15) with respect to τ s , as well.
Appendix B: Derivation of the free energy

Representation of intensity
First, we represent the intensity (18) of the inhomogeneous Hawkes process in terms of the mean behavior and the fluctuations. We consider decomposing a series of events into the mean and fluctuation as where ξ(t) is the white noise (the "derivative" of the martingale [50]), whose ensemble characteristics satisfy ξ(t) = 0 and ξ(t)ξ(t ′ ) = λ * (t) δ(t − t ′ ). Using this decomposition, Eq. (18) can be represented as By applying the Laplace transformation, this equation is solved as where Ψ * (t) is an "effective self-exciting kernel" whose Laplace transform is given by where φ * (s) is the Laplace transform of the self-excitation kernel φ * (t). The first and second terms on the righthand-side of Eq. (B3) represent the average behavior and the third term represents the fluctuation around the averaged behavior. Representing the original exogenous input (19) as ν * (t) = µ * + σ * ζ(t), where ζ(t) is the normalized fluctuation whose autocorrelation function is given by ζ(t)ζ(t + u) = exp(−|u|/τ * e ), Eq. (B3) is expressed as where In the same manner, by decomposing the exogenous rate in the decoder into the mean and fluctuation, ν(t) = (1−α)Λ * +x(t), the decoder's intensity (1) is represented as where Here, Ψ(t) represents the effective self-exciting kernel of the decoder, whose Laplace transform is given by where φ(s) is the Laplace transform of the self-excitation kernel φ(t) of the decoder.
The path-integral for the marginal likelihood (11) is carried out by changing the variable from {ν(t)} to {x(t)}: (B13) where the Lagrangian is expressed using Eqs. (B1), (B5) and (B9) as Using this Lagrangian, the path-integral (B13) is evaluated as Eq. (15). We derive the contributions of the MAP solution and the "quantum effect" to the path-integral below.

Contribution of the MAP solution
Expanding the Lagrangian Eq. (B14) in terms of x(t), and ignoring o(σ * 2 /µ * ) and the irrelevant terms yields for which a solution of the Euler-Lagrange equation (13) is obtained aŝ For the Lagrangian (B15), the action integral (12) is given as Here, the first term on the right-hand-side is a boundary effect, which is negligible in comparison to the bulk contribution given the long event series T ≫ 1. Substituting Eq. (B16) into this formula and assuming ergodicity, we obtain the contribution of the MAP solution as

Contribution of the quantum effect
The quantum effect (16) is given by the ratio of functional determinants of the second order differential operators associated with the Lagrangian (B15) [69,70], The Gelfand-Yaglom method allows us to calculate the functional determinants by an initial-value problem for the corresponding differential operator [71], Then, the Gelfand-Yaglom reads (B20) By solving the differential equations, the quantum contribution is obtained as

The Hawkes decoder with an incorrect self-excitation kernel
We analyze the free energy for the Hawkes decoder with an incorrect self-excitation kernel (τ s = τ * s ) in the limit of τ s /τ * e ≪ 1 while τ * s /τ * e remains finite (which includes the cases of τ * e ≫ τ * s and τ s ≪ τ * s ). The free energy (B29) is asymptotically expanded with respect to τ s /τ * e as where F p (α, γ) represents a collection of constant terms that satisfy F p (α = 0, γ) = F p (γ). From Eq. (C7) we obtainα = O(τ s /τ * e ), i.e., the estimated reproduction ratio becomes infinitesimal. In this limit, the free energy is identical to that of the Bayesian rate decoder, F p (γ).