Poisson Kalman filter for disease surveillance

An optimal filter for Poisson observations is developed as a variant of the traditional Kalman filter. Poisson distributions are characteristic of infectious diseases, which model the number of patients recorded as presenting each day to a health care system. We develop both a linear and a nonlinear (extended) filter. The methods are applied to a case study of neonatal sepsis and postinfectious hydrocephalus in Africa, using parameters estimated from publicly available data. Our approach is applicable to a broad range of disease dynamics, including both noncommunicable and the inherent nonlinearities of communicable infectious diseases and epidemics such as from COVID-19.


I. INTRODUCTION
There has been significant recent interest in the model-based control of disease, specifically using prevention and treatment as methods of control [1][2][3][4][5].Such model-based frameworks have been instrumental in our understanding of the dynamics and control of infectious diseases [6] and strategies for global public health policies [7].Successful applications of mathematical modeling and control depend on accurate determination of system states, where data assimilation methods such as Kalman filters [8] play a crucial role in constraining the model with available data.
Although Kalman filters typically assume Gaussian distributed observations that have direct functional relationships to the state variables, this is unlikely to be suitable for diseases with low rates of occurrence, as is the case in the early or late stages of the spread of most infectious diseases.While a Poisson distribution with a large rate constant can be well approximated by a Gaussian of the same mean and variance, the approximation breaks down when the rates of occurrences are much smaller [9].Even more importantly, the variance of a Poisson observation changes along with its mean, whereas the mean and variance of a Gaussian are decoupled, and often the variance is assumed to be constant (or at least unrelated to the mean) in the Kalman filtering context.
In this article we argue that the standard application of Kalman filtering methods is poorly matched to data available during disease surveillance.In particular, the assumption of Gaussian noise-perturbed observations is a clear source of inaccuracy when used to model the arrival of patients at medical facilities.We develop a variant of the Kalman filter that assumes Poisson observations and show how to modify the traditional Kalman equations to produce an optimal filter.
For one-dimensional systems, an optimal filter has been previously designed for Poisson observations [10], but has not been generalized to multivariate systems.Moreover, in order to summarize the true distribution of the state x k at time step k given the Poisson observations in Ref. [10], a very large number of variables needed to be stored and recalculated at each step.In fact, the number of variables needed also grows very quickly with k (compared to the Kalman filter where the number of variables tracked is constant in k).Another alternative would be to use a fully Bayesian approach such as a particle filter designed with the Poisson likelihood function.Such an approach would be guaranteed to estimate the true posterior given a sufficiently large ensemble, but such large ensembles often result in high computational complexity.Instead, we propose a filter which is very similar to the Kalman filter, but is adapted to the unique statistics of the Poisson observations.The proposed approach maintains the simplicity and computational efficiency of a Kalman filter by only tracking the mean and covariance of the estimated state.A related linear filter called the generalized Kalman filter was introduced in Ref. [11], which employed a fixed observation noise covariance matrix that is optimal among all linear filters that are fixed in time.In contrast, we will derive the optimal time-varying linear filter and we will use the state estimate to update the observation noise covariance matrix dynamically.In fact, we will show that the optimal linear filter for Poisson observations is almost identical to the standard Kalman filter except that the observation noise covariance matrix depends on the state estimate.
In Sec.II we first show that by choosing an appropriate observation map, the standard Kalman filter gives an unbiased estimator for Poisson observations.This justifies using a Kalman filter in the disease modeling context, as long as the observation map is well chosen.We then show how to modify the Kalman equations to produce an optimal linear filter in the sense of minimizing the expected squared errors.We prove the optimality of this choice in the Appendix.While the optimal filter nominally requires knowledge of the true state, we show empirically that using the filter estimate of the state gives near-optimal performance.We call this approach the Poisson Kalman filter (PKF).
Recently, Li et al. [12] assimilated Poisson observations to carry out modeling of the coronavirus (COVID-19) epidemic.Their modifications to the traditional Kalman filter are in the same spirit as those proposed here, in that the observation noise covariance matrix V is designed to vary with the data.In this article, we derive the Kalman equations that lead to the optimal linear filter and prove that the optimal choice for linear dynamics is to set V k to vary proportionally to the number of predicted cases.Nonlinear extensions of the Kalman filter follow standard strategies of generalizing the linear formulas (e.g., the extended and ensemble Kalman filters [13]).We develop a nonlinear extended PKF (EPKF) in Sec.V suitable for contagious infectious disease.
We should note that an extended Kalman filter was previously developed in Ref. [14] for point processes where the observation increments are conditionally Poisson given a stochastic hidden variable.Related approaches were applied to crime statistics in Ref.
[15] and neuronal signals in Ref. [16].In Refs.[15,16] they assumed that the Poisson rates are functions of a hidden state variable x k that evolves according to a Markov model x k + 1 = x k + noise.In our approach we allow a larger class of stochastic models x k + 1 = f x k + noise with nontrivial dynamics.Moreover, the filters in Refs.[15,16] were derived as a Gaussian approximation to a Bayesian posterior, which leads to a nonlinear filter in Ref. [15].Instead, we take a different approach by deriving an optimal linear filter for Poisson observations.Our case studies start from compartmental models which are built on the standard susceptible-infected-recovered (SIR) model and its variants [6].The SIR model tracks three variables which represent three populations, susceptible S, infected I, and recovered R.
A key feature for communicable disease is that the rate of increase of the infected is proportional to the product of the susceptible and infected populations SI, a nonlinear interaction term that is motivated by the contagious nature of the diseases being modeled.
In Sec.III below we introduce an SIR model for noncommunicable diseases and in Secs.III and IV show how to apply the Poisson Kalman filter to track the model from example data from two endemic diseases affecting childhood health in Africa: neonatal sepsis (NS) and postinfectious hydrocephalus (PIH).Although many of these infections are noncommunicable, acquired during birth or from the environment afterward, there is new evidence supporting a role for communicable viruses [17].In this work we show how the use of the PKF and EPKF fills the need for a computational framework that embodies the interdependent dynamics of NS and PIH.In Sec.VI we discuss future directions both for a more detailed study of NS and PIH and for further extensions of the filtering for infectious disease epidemics.

II. DATA ASSIMILATION FROM POISSON OBSERVATIONS
Estimating the current state of a dynamical system is a critical challenge when applying compartmental modeling to disease forecasting and control.Data assimilation is a method of estimating the state from a time series of noisy observations.In particular, for a linear system x k + 1 = f x k , the Kalman filter [18] gives the optimal state estimate (minimal variance) and also quantifies the uncertainty in the estimate.However, the Kalman filter was designed for engineering applications where the observations are assumed to have a direct functional relationship to the state variables, except perturbed by Gaussian noise.
There are at least three reasons why this assumption fails for typical disease surveillance.First, counts of individuals with a disease are by definition non-negative, contradicting the Gaussian model for uncertainty.Second, the size of the Gaussian noise is decoupled from the population count, being the same magnitude for low populations as for large populations.Finally, in order for the population to be the observed variable, one would have to make a survey, at each time step k, of the entire population to directly observe I k , the number of infected at time k.Since this is an unrealistic proposal, the filtering method needs to be adapted to the type of observations that are practical for disease surveillance.We will refer to this modification of the Kalman filter by the name Poisson Kalman filter, which we show to be unbiased and optimal among all linear filters.
We operate under that assumption that the disease population cannot be measured directly.In fact, a reasonable model for observations of disease cases, for example, those presenting at a hospital, is a Poisson process, whose rate is proportional to the infected population.Assume that at time step k, the number of new infected patients I k will be approximated by a Poisson random variable with rate λ k, I = c I I k , where c I is a proportionality constant.
In a typical filtering problem we would assume that we are given direct observations y k of the form B x k + v k , where v k are random variables representing observation noise.
However, in the Poisson observation context, we instead observe a pair of independent Poisson random variables with rates given by the components of B x k .We will denote this type of observation by meaning that y k i is Poisson with rate B x k i , where i corresponds to the i-th component of the vector.To be more precise we assume that, conditional on B x k , the components y k i are independent Poisson random variables with the density function The above conditional density makes it clear that y k and x k are not independent.
In the case of direct observations, one typically assumes that y k splits into a sum of two terms, the first of which has a deterministic dependence on x k and the second of which is independent of x k .However, for Poisson observations this splitting is not possible.Despite this irreconcilable dependence between y and x , the following lemma shows that if we appropriately center y , namely, y − E[ y | x ], the result is not correlated with x .

Lemma 1.
Let λ be an arbitrary random variable and let z be a Poisson random variable with rate λ so that the conditional density of where the second equality follows from the inner expectation being conditioned on λ and the third follows from the linearity of the expectation.
Lemma 1 turns out to be the key to deriving an optimal linear filter for Poisson observations.While Poisson observations are a more realistic model for the type of data available in disease modeling, we now must design a filter which can assimilate these data and produce estimates of the state variable x k .

A. Poisson Kalman filter
A linear filter produces an estimate x k of the true state x k of the form where A 1 and A 2 are matrices.This is a more restricted class of filters, but we will be able to show that our filter is unbiased, meaning E x k = x k , and is the optimal linear filter in the sense of giving the minimal squared error.
The PKF assumes a model of the form where b k is a known deterministic forcing term and ω k is dynamical noise with mean zero E ω k = 0 and known covariance matrix E ω k ω k ⊤ = W .We also assume that the ω k are independent of x k , y k , and all other ω ℓ for ℓ ≠ k.The PKF also assumes that model F and observation matrices B are known.We note that the dynamics F and observation matrix B can also be allowed to change at each step (nonautonomous), but to simplify the notation we assume they are constant.
Like the standard Kalman filter, the PKF is a two-step filter, meaning that it breaks down the estimation of x k + from x k − 1 + into a forecast step and an assimilation step.In the forecast step we apply the model to our current estimate x k − 1 + to produce the forecast and in the assimilation step we assimilate the new observation by It is easy to see that this is a linear filter with A 1 = I − K k B F and A 2 = K k .The filter is defined by the choice of the matrix K k , which is called the gain matrix.Our first result is that any filter of the form (4) is unbiased.
-If we assume that E x 0 + = x 0 , then for any choice of gain matrices K k the two-step filter defined by ( 3) and ( 4) is unbiased, meaning The proof of Theorem 1 is straightforward and can be found in the Appendix 1.The gain matrix is determined by a secondary set of computations which track the covariance matrix P k + for the estimate x k + .The covariance matrix is also evolved according to a two-step evolution starting with a forecast step which allows us to calculate the optimal gain matrix and then we can complete the assimilation step While it may seem that P k is only really necessary in order to compute the gain matrix K k , the matrix P k also gives an error estimate for the state estimate.
The final component that is required is the V k matrix in the formula for the optimal gain.In the standard Kalman filter, V k is the covariance matrix for the observation noise.However, in the PKF the variance of the observations is equal to B x k (meaning var y k i = B x k i .So intuitively we would expect to use V k = diag B x k .The next theorem states that this yields the optimal linear filter. Theorem 2.-Among all linear filters, the filter given by ( 3) and ( 4) with gain matrix K k given by ( 5), where V k = diag Bx k , is optimal in the sense of a minimal sum of squared errors.Alternatively, where The proof of Theorem 2 is closely related to Lemma 1 and can be found in the Appendix 2. Unfortunately, the optimal filter is not accessible since it requires access to the true state x k in order to define the optimal gain matrix.Instead, since x k is an unbiased estimator (for any gain matrix) we approximate the optimal filter by using V k = diag Bx k − .We call this approximation the Poisson Kalman filter.

B. PKF equations
The discrete-time Poisson Kalman filter algorithm is given below.In order to connect with the potential optimal control applications we include the control term If there is no control this term can be dropped.We also allow all the matrices to vary with time.
(1) In the dynamical system where δ k − j is the Kronecker delta function such that When the state is close to zero the Gaussian noise may move the system into negative values, so at each step we take the maximum of each component and zero.In all the comparisons below, we also apply this maximum to the Kalman filter and extended Kalman filter simulations.Note that diag B k x k is the true variance of the Poisson observation y k .
However, in the filter below we set V k = diag B k x k − since this is the best available estimate.
We now summarize the steps required to obtain the PKF estimates.
(2) At initialization (3) For the prior estimation (forecast step) (4) For the posterior estimation (assimilation step) Notice that before the diagonal matrix V k is formed, we first take the maximum of the diagonal entries and a constant γ.This is necessary because when the diagonal entries of V k are too close to zero the filter can become numerically unstable.The constant γ should be chosen to be small relative to the average value of the B k x k , and in all our numerical experiments we set γ = 0.1.Finally, we note that in practice the initial estimates x 0 + and P 0 + are often not available.However, the effect of these initial estimates on the accuracy of the state estimates decays to zero exponentially as k ∞, and often P 0 + is simply chosen to be a multiple of the identity matrix.

III. AN SIR MODEL FOR NONCONTAGIOUS DISEASE IN A RESTRICTED POPULATION
Severe systemic bacterial infection in the neonatal period, neonatal sepsis, accounts for an estimated 680 000-750 000 neonatal deaths per year worldwide [19], more than childhood deaths from malaria and HIV combined [20].The most common brain disorder in childhood is hydrocephalus, and the largest single cause of hydrocephalus in the world is as a sequela of NS [21], accounting for an estimated 160 000 yearly cases of postinfectious hydrocephalus in infancy [22].The microbial agents responsible for this enormous loss of human life have been poorly characterized [23], although next-generation molecular methods show promise to improve the identification of causal agents [17].Both NS and PIH occur disproportionately in the developing world, and most of the PIH cases will die in childhood without adequate treatment, substantially compounding the effective mortality due to NS and its tremendous burdens on societies [24,25].
We expect a natural application of the PKF will be to SIR modeling.Consider a discretetime SIR model for neonatal sepsis with three classes: S k is the susceptible population at time k, I k the infected population, and R k the recovered population.(Later, in Sec.IV, the model will be expanded to include a postinfectious hydrocephalic class.)Since there are many unmodeled factors which affect the adult population and the feedback of neonatal infection into the birth rate takes place on a relatively long timescale, we do not include the adult population in the model.Thus, S k , I k , and R k represent neonatal and infant populations.Since we are modeling neonatal infections, the susceptible and infected classes are neonatal and S k + I k represents the neonatal population.The recovered class R k will track those that recover from sepsis for a period of time that can be chosen by the modeler as will be described below.The model is summarized in the diagram in Fig. 1.
Modeling only the neonatal/infant populations requires several deviations from the standard SIR model.First, the birth rate is not proportional to any of the model populations and is instead a forcing b k which introduces new population into the susceptible class at each time step.Moreover, there are now three ways to leave the susceptible class: (i) a neonatal mortality rate d, due to factors other than infection (this will affect the two neonatal classes S k and I k ), (ii) an infection rate a, which feeds into the infected class, and (iii) a "grow-up" rate g S , which signifies no longer being susceptible to neonatal infection.The model is Notice that the g S rate removes neonates from the model entirely, so effectively the grow-up rate g S will control the length of time that we consider to be "neonatal."Given a time period T S for susceptibility, we set g S = 1/T S , which makes the simplifying assumption that the susceptible population is always equally distributed across different ages.The grow-up rate g R controls the length of time that infants in the recovered class are tracked, so g R = 1/T R , where T R is the amount of time we track the recovered class.The two parameters g S and g R control the two timescales for susceptibility and recovery (which will become more significant later when we consider the longer timescale possibility of developing hydrocephalus) and c is the rate of recovery from infection.
With the state variable , the matrix form of the evolution is where If the birth rate is assumed to be constant b k ≡ b, the steady-state populations can be explicitly solved.Setting S ∞ ≡ S k + 1 = S k in the susceptible population in Eq. ( 14), we can solve for S ∞ = b d + a + g S .Substituting this for S ∞ = S k in Eq. ( 15) and setting I ∞ ≡ I k + 1 = I k in Eq. ( 15), we can solve for I ∞ and similarly we can solve for R ∞ giving steady-state solutions These steady-state solutions have important public health implications on the timescale where the birth rate is approximately constant.First, S ∞ determines the scale of public health improvement if susceptibility can be reduced (prevention).Second, I ∞ determines the resources needed to meet the average infection burden.

Case study: Neonatal sepsis in Uganda
Publicly available statistics can be used to approximate parameters for NS in Uganda during the time frame 2014-2015.We consider a discrete time step (the time between steps k and k + 1) of one day and a neonatal period of T S = 28 days.From [26] we find a 2015 birth rate of 1 665 000 per year for Uganda, which for a daily model yields b ≈ 4562.
Using 2014 statistics for neonatal sepsis in sub-Saharan Africa, we find a neonatal mortality rate of 29 per 1000 with 17%-29% attributable to sepsis [25].For simplicity, we assume that the neonatal mortality rate of 29 per 1000 can be divided into 7 attributable to sepsis (approximately equal to 23% of neonatal mortality, the midpoint of the 17%-29% range) and 22 attributable to other causes.
Since we assume the neonatal period is applies only to the infected class (whereas d applies to both the susceptible and infected classes and thus is a rate for the entire neonatal population).That is, d I represents the daily rate of mortality due to sepsis as a percentage of the population that has sepsis (rather than 7/1000/T S , which is the daily rate as a percentage of the entire population).So before we can determine d I , we first must determine the infection rate a. Infection rate estimates can vary widely based on methodology ( [25] quotes a range of 5.5-170 per 1000 live births).
Based on the estimate of one of the authors (S.J.S.), who is a physician conducting medical research on these infants in Uganda, there is a range of 30-60 per 1000 live births in that nation.Conservatively assuming 30 per 1000, we take a = 30/1000/T S as a daily rate of infection.Now the constant d I can be determined.We stated above that 7 of the 1000 will die from sepsis, meaning that 7 of the 30 who get sepsis will die from it.Thus, we find that d I = 7/30/T S is the daily rate of death due to sepsis among those that already have sepsis.
This immediately gives us the recovery rate: 7 of the 30 who get sepsis will die from sepsis and 30(22/1000) will die from nonsepsis causes.so in fact c is chosen to ensure that all of the infected classes leave within the neonatal day period.
The infant mortality rate m 2 , which covers mortality of the first year after birth, infancy, or T i , can also be derived from data.Consider a tracking time for the recovered population of this first year minus the neonatal period T R = T i − T S (we assume that the recovered population is entirely outside the 28 day neonatal period).For the death rate in the recovered class we start with the infant mortality rate of 77 per 1000 (in the first year [25]) and subtract the 29 per 1000 neonatal mortality rate to find d R = 48/1000/T R .
We summarize the inputs to the model in Fig. 2 and then compute the parameters d, d I , d R , c, T R , g S , and g R by where s is the fraction of neonatal mortality due to sepsis.

IV. THE SUSCEPTIBLE-INFECTED-RECOVERED-HYDROCEPHALIC MODEL: MODELING THE HYDROCEPHALIC POPULATION
We now turn to a model that specifically links neonatal infection and PIH.The essential idea is that those that have recovered from sepsis are now susceptible to developing hydrocephalus.The constant ℎ represents the rate at which recovered infants move from the recovered class R k to a new hydrocephalic class H k , leading to the equations The susceptible-infected-recovered-hydrocephalic (SIRH) system is summarized in the diagram in Fig. 3.
The hydrocephalic class is subject to an additional mortality rate due to hydrocephalus d H , which requires recalibrating the recovered rate d R so that it does not include deaths due to hydrocephalus.We set d R = m 2 − m 1 − pd H T R /T R , where m 2 is the infant mortality rate, m 1 is the neonatal mortality rate, p is the rate of PIH in the total population under consideration (discussed in Sec.IVA below), and d H T R is the rate of death of those who develop PIH during infancy (d H is the daily rate and T R is the remainder of the infancy period).Finally, we note that the steady-state value of the recovered class changes from the SIR model due to the rate ℎ, and the new steady state along with the hydrocephalic steady state are given by We now return to our case study of modeling PIH in Uganda.

A. Case study: Infant hydrocephalus in Uganda
The first parameter to consider is ℎ, the rate of developing postinfectious hydrocephalus.In Ref. [25] it was reported that the incidence of PIH is 3-5 per 1000 live births.We will take the low estimate of 3 per 1000 setting p = 3/1000, since it will be shown to be more consistent with other statistics below.Recall that above we estimated that for 1000 live births there are 30 cases of sepsis and 22.34 of those recover.Since only recovered sepsis cases can develop PIH, this implies a rate of developing hydrocephalus of ℎ = 3/22.34/TR .
The death rate due to hydrocephalus is highly dependent upon treatment.The untreated death rate is estimated at 50%, while treatment can reduce this to 25%.We assume an overall death rate of 33% [27] and we set Finally, we recalibrate the death rate for those recovering from sepsis by removing the deaths due to hydrocephalus (since those are accounted for in the H k variable).So we set The results shown in Fig. 4 predict a steady state of approximately 10 000 ongoing cases of PIH with an annual PIH incidence of approximately 4000 per year 365 × ℎ × R ∞ and annual deaths due to PIH of approximately 3300, consistent with existing estimates [24].

B. PKF simulations
Using the SIRH model described in Sec.IV, we evaluate the performance of the PKF when the observations follow a random Poisson distribution with known rates λ k, I = c I I k and λ k, H = c H H k , which represent the number of infants with sepsis and the number of infants with hydrocephalus that show up at the hospital.Our observations can be written as where and we start the system at the equilibrium values.
The simulation in Fig. 5 was run with system noise W = diag(144, 1, 1, 10) × 10 7 and the constant daily birth rate b = 4562, while setting the sepsis and hydrocephalus proportionality constants as c I = 0.2/T S and c H = 0.6/T R , respectively.The idea behind these values is that if 20% of total sepsis cases seek care over the entire T S period of sepsis susceptibility, then the daily rate of arrivals would be 0.2/T S multiplied by the number of true sepsis cases (20% was chosen purely for purposes of simulation).In Fig. 5 we see that the PKF (red dashed curves) gave good estimates of the observed variables, namely, the infected and hydrocephalic populations.The PKF also obtains information about the unobserved variables, namely, the susceptible and recovered populations at least on a slow timescale; however, the fast timescale information about the unobserved variables seems limited.
Figure 5 also compares the PKF to a Kalman filter (blue dotted curves) which was given the optimal fixed observation noise covariance matrix V const = diag(Bx), where x is the time average of the state variables.The disadvantage of the fixed gain is that when the number of infected or hydrocephalic is large the variance of the observations will be larger than the average value.This means that the Kalman filter will underestimate the observation variance and use an oversized gain.This is shown in Fig. 5, where the Kalman filter estimates closely follow the observations when the number of infected or hydrocephalic is large.The PKF dynamically adjusts the observation covariance matrix based on the state estimate in order to prevent this.This is further shown in Fig. 6, which compares the root mean square error (RMSE) for the PKF and the Kalman filter for various levels of system noise.Figure 6 also compares the PKF, which uses the filter estimate to determine V k , to an oracle PKF, which uses the true state for V k , and we see that their performances are almost identical even at high noise levels.In this case (with a linear model), the Kalman filter and PKF have similar performances for the unobserved variables, which seems to indicate that they are relying more on the stability of the model rather than correlations with the observed variables.Of course this is reliant on using the optimal V const matrix in the KF.Moreover, in the context of disease surveillance filtering the observed variables is critical to account for over/under reporting in producing a clean data set, and for these variables the PKF has a significant advantage.
Finally, we note that the PKF has the largest advantage at high noise levels.This is because the SIRH system is a stable linear system, so noise is the only unstable component of the dynamics.In the absence of noise, no filter would be necessary since all trajectories would converge to the equilibrium regardless of observations.This suggests that a generalized PKF (such as the extended PKF considered below) would have an advantage for nonlinear dynamics with unstable directions even in the absence of system noise.

V. EXTENDED POISSON KALMAN FILTER FOR CONTAGIOUS DISEASE
So far we have considered a linear model for NS, which is sufficient for noncontagious infections.However, contagious disease models typically contain a nonlinearity that models the contagious spread.In order to broaden the applicability of the PKF we now show that it also offers improvements for these nonlinear models by using a standard approach to extend the Kalman equations to nonlinear dynamics.Moreover, because there are potential mechanisms for contagious infections contributing to NS [17], modeling these infections requires a nonlinear system.As in the classical SIR model, we assume that the contagious spread will be simultaneously proportional to the both the number of susceptible cases and the number of infected and so we model the number of contagious cases at time k as βS k I k , where β is infectivity.Introducing this term to the SIRH model, we have The state of the nonlinear model is x k + 1 = f k x k , where x k = S k , I k , R k , H k .When the birth rate is constant b k = b we can write f k = f and the system can be considered autonomous, but we also allow nonautonomous dynamics as long as each f k is known.This model is of significant interest since estimating the a and β parameters from data would help determine the role of contagious spread in NS.
A standard method for lifting the Kalman filter to the nonlinear setting is the extended Kalman filter (EKF) [28].The EKF uses the nonlinear dynamics to produce the forecast + and a linear approximation to the dynamics is used for forecasting the To define F k the EKF linearizes the dynamics around the current state estimate, setting F k = Df k x k + .This approximates the nonlinear dynamics as a nonautonomous linear system for the purposes of forecasting the covariance estimates.In the example below we apply the EKF using Since the PKF is also based on the Kalman equations, we can use this same idea to extend the PKF to nonlinear systems, which we call the extended PKF.
In Fig. 7 we simulate the system (24) with β = 10 −6 initialized at the noncontagious equilibrium found in Sec.IV (all other parameters are the same as in Sec.IV).This simulates the introduction of a contagious source of disease to a system that had stabilized at the noncontagious equilibrium.To demonstrate the advantage of the EPKF over the non-Poisson version, we assume that the EKF is given the optimal fixed gain for the noncontagious equilibrium.Figure 7 shows that as the number of infected and hydrocephalic cases increase the EKF estimate becomes very noisy, since it is underestimating the observation variance and as a result follows the observations too closely.This shows how the EPKF is able to automatically adapt to the new equilibrium.Figures 7(a)-7(d) show the simulation with the standard observation rate c I = 0.2/T S and c H = 0.6/T R .To demonstrate the ability of the EPKF to assimilate at very low observation rates, we repeated the experiment after reducing the observation rates by a factor of 1000, and these results are shown in Figs.
7(e)-7(h).In this case, due to the low observation rates, of the 1000 days shown in Fig. 7, 624 days had zero infected reported and 908 days had zero hydrocephalic reported.Despite these large numbers of zeros, the EPKF effectively assimilates the available information.
Another critical aspect of filter performance is the accuracy of the uncertainty quantification provided by the covariance matrix.In Figs.8(c) and 8(d) we compare the true signal (black solid curve) to the two-standard-deviation regions around the EPKF and EKF estimates.These regions are generated by adding and subtracting twice the square root of the diagonal entry of the covariance matrix estimate at each time step.An accurate uncertainty quantification would imply that the truth only leaves the region for around 10 of the 200 time steps shown.Notice that the EKF significantly underestimates the variance, meaning that it is overconfident in its estimator, because the true signal leaves the region much more than expected.The EPKF gives a more accurate uncertainty quantification, perhaps slightly overestimating the uncertainty in the low-observation-rate case since the true signal never leaves the region.
We also compare the EPKF to the standard EKF using the optimal fixed gain for the contagious equilibrium, starting from the contagious equilibrium.Figure 8 shows that the EPKF has the largest advantage at high system noise levels (as in Fig. 6) due to the absence of unstable directions in the deterministic dynamics near equilibrium.This suggests that the EPKF would have an even more significant advantage for chaotic systems.Also, as in Fig. 6, we compare the empirical EPKF, which uses the state estimate to determine the observation variance, to an oracle version of the EPKF that uses the true state to determine the observation variance, and again the performance is very similar.
Finally, we note that two closely related alternative approaches to applying the Kalman filter to nonlinear dynamics are the ensemble Kalman filter (EnKF) [28] and ensemble adjustment Kalman filter (EAKF) [29].Both of these methods use an ensemble forecast instead of linearizing the dynamics to estimate P k + 1 − .Since the EnKF and EAKF are also based on the Kalman formulas, the PKF method can be applied just as easily to these methods.A potentially significant issue for most ensemble methods is that ensemble members are typically not guaranteed to be positive, leading to negative populations in the ensemble members.One possible method to address this would be to log-transform the model; however, this has the downside of introducing stronger nonlinearities into the model.A related idea would be to replace the observations with the variance-stabilizing transformation y k + 1/4, which is approximately Gaussian with mean Bx k and constant variance 1/4.Both these approaches simplify the statistics at the cost of introducing additional nonlinearity into the model or observation function, and exploring these tradeoffs is an interesting subject for future investigations.
This approach is closely related to the recent work of [12] applied to COVID-19 with Poisson observations.There an EAKF was used with a heuristically chosen observation covariance V k which was proportional to the square of the observations.In fact, Li et al.
[12] also suggested an alternative of using V k proportional to the observations.Our analysis of the PKF shows that in fact the optimal choice for linear dynamics is to set V k equal to the predicted observations, as we propose in the EPKF.In fact, our analysis of a nonlinear contagious model in this section suggests that Li et al. [12] were very close to the optimal approach.

VI. CONCLUSION
The mathematical methods of filtering and control originated with linear models, direct observations, and Gaussian noise.However, these assumptions may not be appropriate in the context of disease modeling, where the observations of cases of communicable and noncommunicable diseases often present as Poisson processes.Unfortunately, the customary Kalman filter is not well suited to assimilate such Poisson occurrences and estimate the true number of underlying cases.The Poisson Kalman filter is an optimal filter for such surveillance.
The linear PKF is a very general filter suitable for a broad range of noncommunicable disease observations where the nonlinear interaction of susceptible and diseased individuals is not an inherent component of disease initiation (including noninfectious disease such as diabetes or stroke).We extended our findings to encompass the nonlinear interactions of susceptible and infected individuals typical of contagious disease through an extended PKF or EPKF.
We also created an SIRH compartmental model that can be used in the surveillance of neonatal sepsis and postinfectious hydrocephalus, endemic diseases that cause tremendous numbers of yearly global deaths in the developing world.In particular, we incorporated both the noncommunicable and communicable dynamics that have been observed in these infant infections.
Additionally, our case study of sepsis and hydrocephalus suggests many promising directions for future development.If a more careful tracking of cases is desired, the neonatal and infancy periods can be segmented into multiple stages.For example, it is well known that the infections that are acquired perinatally from the mother, so-called early-onset sepsis, are manifest within the days of the first week of life.Infections during the subsequent weeks of the neonatal period (first four weeks) are environmentally acquired and are typically a very different spectrum of organisms.Therefore, S (0i) could represent susceptible at 0 to i days after birth and S (ij) could represent susceptible cases from i to j days after birth, with varying rates and risks from sepsis at different stages of development.Another critical factor in sepsis and hydrocephalus cases is environmental variables such as rainfall [30], which suggests that a full spatiotemporal model will be necessary to more fully represent these dynamics.Recent findings [17] demonstrate that more than one infection (co-infection) can be found in some of these infants (perhaps even a mixture of noncommunicable bacteria and communicable viruses), demonstrating that a mixed linear and nonlinear model would be required to represent such co-infections.A spatiotemporal model would allow the optimal control to consider multiple methods of control and determine ideal locations and times to apply each.
The recent coronavirus epidemic is one where Poisson dynamics is required in the modeling and data assimilation [12].In this article we derived the Kalman equations that lead to the optimal linear filter and proposed that the optimal choice is to set the observed covariance equal to the predicted observations as proposed in the nonlinear EPKF.
where the rate λ k = Bx k ⩾ 0. We note that the positive real rate λ k equals the expected value E z k and variance Var z k such that In our case, we assume that the state x is non-negative.We also assume that each Poisson random variable z k i has a rate that is only dependent on the corresponding state x k i such that E z k i = c i x k i where c i ∈ ℝ + are known deterministic parameters.A linear recursive estimator can be written as where K k is the optimal gain matrix to be determined.

Theorem 3.
The estimator of (A4) is an unbiased estimator of x k , that is, E x k − x k = 0.
Proof.-Calculating the estimation error mean, we write (A5) So since E z k = Bx k = Bx k − 1 and inductively we assume E ϵ x, k − 1 = 0, we have E ϵ x, k = 0. Therefore, (A4) is an unbiased estimator.
Note that the unbiased estimator property holds regardless of the value of the gain matrix K k .This implies that, on average, the state estimate x k will be equal to the true state x k , when measurements, which follow a Poisson distribution whose rate is dependent on the state, are taken.Moreover, we note that Theorem 3 holds whenever E z k = Bx k , so as long as the expected value of the observations is linear in the state, one could choose an appropriate B to have an unbiased estimator.We now turn to the construction of the optimal linear filter.
Among all linear filters, the filter given by the linear estimator of (A4) with gain matrix K k given by is optimal in the sense of minimal sum of squared errors when V k = diag Bx k .Alternatively, where Proof.-Using (A5), we solve for the estimation error covariance P k as (A6) Since the estimation error at time k − 1 given by ϵ since the expected value E ϵ x, k − 1 and E z k − Bx k − 1 are both zero.More generally, when x k − 1 = x k is a random variable, ϵ x, k − 1 may not be independent of the measurement z k ; however, the above expectation is still zero since where the second equality follows from the law of total expectation and Therefore, (A6) reduces to Using the fact that Bx = E z k , we rewrite (A7) as (A8) Recall that for a random variable Y with mean E[Y ], the i th central moment of Y , which is written as equals its variance when i = 2 (see Chap. 2 of [28]).Therefore, where V k ∈ ℝ m × m , which is written as V k = diag Bx k , is the covariance of z k .Substituting (A9) into (A8) gives Diagram of the SIR model for neonatal sepsis.Comparison of the RMSE for the (a) infected and (b) hydrocephalic populations of the PKF (red solid curve, optimal variable gain) and the Kalman filter (blue solid curve, optimal fixed gain) as function of the system noise.We also compare to an oracle PKF (black dashed curve) which is given the optimal choice of V k = diag B x k .System noise is quantified as a multiple of the base noise level W .The RMSE is averaged over 10 6 filter steps.(a) and (b) Comparison of the RMSE of the EPKF (red solid curve, optimal variable gain) and the EKF (blue solid curve, optimal fixed gain) as a function of the system noise.We also compare to an oracle EPKF (black dashed curve) which is given the optimal choice of V k = diag B x k .System noise is quantified as a multiple of the base noise level W .The RMSE is averaged over 10 6 filter steps.The two-standard-deviation regions around the EPKF and EKF estimates are shown using the variance estimated by the respective filters for (c) the standard observation rate and (d) the low rate from Fig. 7.

FIG. 2 .FIG. 3 .
FIG. 2.(a) Summary of the inputs to the model for infant sepsis, with values used in parentheses; the remaining parameters are computed using Eqs.(20).(b) Simulation of the model for infant sepsis in Uganda assuming constant birth rate and starting from the zero initial condition S 0 , I 0 , R 0 = (0, 0, 0).

FIG. 5 .
FIG. 5. Comparison of the PKF (optimal variable gain) and the Kalman filter (optimal fixed gain) for the SIRH model with Poisson observations of the infected and hydrocephalic populations.The true (a) susceptible S, (b) recovered R, (c) infected I, and (d) hydrocephalic H values (black) are compared to the PKF (red dashed curve) and Kalman filter (blue dotted curve) estimates.(c) and (d) also show the observations (green circles) rescaled by dividing by the constants c I and c H , respectively.(e) and (f) Expanded versions of (d), enlarged to show detail.When the number of cases is large, the KF estimate of H is

FIG. 7 .
FIG. 7.Comparison of the extended PKF (optimal variable gain) and the extended Kalman filter (optimal fixed gain for the noncontagious equilibrium) for the contagious SIRH model with Poisson observations of the infected and hydrocephalic populations.We compare the true S, I, R, and H values (black solid curve) to the PKF (red dashed curve) and Kalman filter (blue dotted curve) estimates.We show (a)-(d) a standard observation rate c I = 0.2/T S and c H = 0.6/T R and (e)-(h) a low observation rate c I = 0.0002/T S and c H = 0.0006/T R .The infected and hydrocephalic plots also show the observations (green circles) rescaled by dividing
T S days, we convert the neonatal mortality rate due to factors other than sepsis into a daily rate by setting d = 22/1000/T S .The daily neonatal mortality rate due to sepsis is then 7/1000/T S ; however, this is not d I because the d I variable The remaining 30 -7 -30(22/1000) = 22.34 will recover, establishing the recovery rate c = 22.34/30/T S .Note that The steady-state values for the model with these parameters are S ∞ = 121 422, I ∞ = 3643, and R ∞ = 31 152.We note that the steady-state number of infected shows consistency with reported values[26].The recovered class is now susceptible to developing PIH.