Physics-Informed Machine Learning for Power Grid Frequency Modeling

The operation of power systems is aﬀected by diverse technical, economic, and social factors. Social behavior determines load patterns, electricity markets regulate the generation, and weather-dependent renewables introduce power ﬂuctuations. Thus, power system dynamics must be regarded as a nonau-tonomous system whose parameters vary strongly with time. However, the external driving factors are usually only available on coarse scales and the actual dependencies of the dynamic system parameters are generally unknown. Here, we propose a physics-informed machine learning model that bridges the gap between large-scale drivers and short-term dynamics of the power system. Integrating stochastic diﬀeren-tial equations and artiﬁcial neural networks, we construct a probabilistic model of the power grid frequency dynamics in continental Europe. Its probabilistic prediction outperforms the daily average proﬁle, which is an important benchmark, on a time horizon of 15 min. Using the integrated model, we identify and explain the parameters of the dynamical system from the data, which reveal their strong time-dependence and their relation to external drivers such as wind power feed-in and fast generation ramps. Finally, we generate synthetic time series from the model, which successfully reproduce central characteristics of the grid frequency such as their heavy-tailed distribution. All in all, our work emphasizes the importance of modeling power system dynamics as a stochastic nonautonomous system with both intrinsic dynamics and external drivers.


I. INTRODUCTION
Mitigation of climate change requires a comprehensive transformation of our economy and lifestyle, in particular the way we generate and utilize electric power [1,2].Power plants based on fossil fuels must be replaced by renewable sources such as wind and solar power, which are volatile and uncertain [3].Various sectors are being integrated, for instance through electric heatpumps [4], introducing numerous new interdependencies and increasing system complexity.The electric power system is at the heart of this transformation.Hence, understanding risks and guaranteeing stability of the electric power system is critical amidst far-reaching challenges [5].
Power system operation is determined by various technical, economic, and social influences and perturbations.Power generation from renewable sources is essentially determined by the weather [6,7], while the dispatch of conventional power plants is determined on various electricity markets [8].Moreover, the load depends on the decisions and actions of millions of consumers [9].As the power grid does not store electric energy, generation and load must be balanced at all times.On long time scales of hours, this is achieved by trading on electricity markets [10].On short time scales of seconds and minutes, several layers of control reserves balance the grid, e.g., to counteract unforeseen perturbations and forecasting errors [11].The activation of these reserves is mainly controlled by the grid frequency, which directly monitors the power imbalance: a scarcity of generation leads to a drop of the frequency, which is easily monitored anywhere in the grid.The stability of this load-frequency control system is challenged by the energy transformation, as the effective inertia of the grid decreases, making the frequency more susceptible to perturbations [12].
The realistic modeling of frequency dynamics in largescale power systems is profitable but complex due to its nonautonomous character.Stochastic dynamical models have successfully reproduced central characteristics of frequency measurements such as their nonstandard distributions [13][14][15].Such models can be used to generate synthetic frequency time series, which are, for example, employed to optimize electric devices [16].Moreover, they can be used to explore dynamics under different operating conditions, e.g., with an increased wind power generation [17].However, multiple technical, economic, and social influences and perturbations shape power system dynamics, as explained above.As a consequence, the power system must be regarded as a stochastic nonautonomous dynamical system, which makes grid frequency modeling a daunting task.
In this context, the data-driven representation of external drivers can greatly facilitate realistic models, but data assimilation is challenging due to insufficient data sources.Integrating actual load time series can improve stochastic models of grid frequency dynamics in continental Europe [14].The assimilation of load and generation data enabled an accurate reproduction of grid frequency recordings for the Gran Canaria island [17].However, load and generation time series are typically only available at hourly time scales [18], while frequency dynamics happen at much smaller time scales, thus requiring a careful adoption of these external drivers.For large-scale power systems such as the continental European grid, the data are often incomplete with missing or unrealistic data points [19].The physical model is uncertain as, for example, control schemes vary among local control zones and detailed setups are not publicly available [20].Finally, important dynamical parameters such as the inertia cannot be calculated exactly due to scarce time series on the power plant level [21].
Overall, the dynamics of the power grid constitute a complex system: an intrinsic, potentially nonlinear dynamics [13,22] is enriched by external drivers [23] and stochastic influences.This complexity is somewhat simplified as some processes take effect on different time horizons, leading to a separation of time scales.The superposition of time-varying parameters on a longer time scale then leads to an effective superstatistics [24], observed in earlier contributions [25].
In this work, we propose physics-informed machine learning (PIML) to approach these challenges.Compared to numerical simulations, PIML models can perform better in solving inverse problems with noisy, insufficient data and imperfect physical models [26,27].Moreover, they can better generalize from small amounts of data than common machine learning methods [26].Notably, inverse problems are particularly important for power system control to estimate hidden states or dynamical parameters from measurements [28].The inverse problem of inferring system parameters from input-output data is known as system identification [29] and PIML models offer a promising tool for such applications [30].
In particular, we develop a PIML model for the loadfrequency dynamics of electric power systems, which includes proportional and integral controllers, stochastic noise, and external techno-economic driving factors.The internal dynamics is described by a set of stochastic differential equations, which admit a semianalytic solution.The external driving is manifested through specific system parameters, which depend on a variety of technoeconomic features such as the generation mix.This dependency is deduced via an artificial feed-forward neural network (FFNN), which is trained on data of the continental European power system in a maximum likelihood approach.Finally, we interpret our model with SHapley Additive exPlanation (SHAP) values [23,31] to extract the dependency between dynamical parameters and technoeconomic features.All in all, the model bridges the gap between the large-scale behavior of interdependent energy systems and markets and the short-term dynamics of the power system.
The article is organized as follows.In Sec.II we introduce the physics-informed machine learning model for the grid frequency dynamics and discuss its implementation.In Sec.III we present and evaluate three model applications: probabilistic prediction, system identification and explanation, and generation of synthetic time series.Finally, we discuss our results as well as possible future directions in Sec.IV.

II. AN INTEGRATED MODEL FOR POWER SYSTEM LOAD-FREQUENCY DYNAMICS
Here, we present the details of how we constructed a physics-informed model of the power grid frequency, including a stochastic description of short-term dynamics and the interaction with power system operation and markets on longer time scales.Furthermore, we introduce our approach on modeling these interactions with an artificial neural network prediction based on techno-economic features.The detailed implementation of our model pipeline, as well as all input data and the results are available from Zenodo [32,33].

A. Short-term dynamics and control of the grid frequency
Our starting point is a stochastic model for the dynamics of the grid frequency on short time scales between 1 s and 1 h [cf.Fig. 1(a)].The rate of change of the frequency at time t is determined by the balance of power generation and load as well as the load-frequency control system (details are provided in Appendix A 1). Denoting the deviation from the reference as where M is the aggregated inertia constant.The power imbalance on the right-hand side has been decomposed into three contributions.First, the term P im (t) denotes sustained power imbalances, for instance due to a recurring mismatch of the load and the scheduled generation of dispatchable power pants [cf.Fig. 1(b)].Their generation changes in a stepwise manner, as electricity is traded on the spot markets in blocks of 15, 30, or 60 min [8].In contrast, the load changes continuously, which leads to deterministic power imbalances [34].For time intervals of an hour, we can approximate the time dependence of these imbalances by an affine linear function where (t) is the Heaviside function.Parameters q 1 , . . ., q 4 model the power steps of scheduled generation after quarters of the hour t = 1 h, which is the shortest trading block.Parameter r represents the continuous drift of the load [cf.Fig. 1(c)].Notably, q n and r are strongly correlated as a rising load typically leads to an increase of scheduled generation.However, irregularities such as forecast errors can also lead to a sustained mismatch of load and scheduled generation.This requires an independent modeling of power steps q n and drift r.
Second, the term P control (t) denotes the balance of primary and secondary load-frequency control, which can be modeled by a proportional-integral law as with time constants τ and κ.Parameter τ quantifies the effective primary control time scale and κ can be interpreted as the intrinsic time scale of secondary control.In contrast, the effective time scale of secondary control is approximated by κ 2 /τ [13], as the frequency decays with this time constant in the overdamped case [14].We note that some simplifications are necessary to keep the model tractable.For instance, Eq. (3) neglects the existence of a small deadband in the proportional control law.Finally, we assume that the remaining power imbalance fluctuations P noise (t) are faster than 1 s and thus uncorrelated on the time scale of our short-term dynamical model.Therefore, we modeled such fluctuations as where ξ(t) is white noise with a standard normal distribution and D quantifies the strength of fast imbalance fluctuations.We used a Gaussian distribution because frequency increments on time scales of seconds are approximately Gaussian in continental Europe [35].
Because of the presence of noise, the equation of motion (1) must be interpreted as a stochastic differential equation (SDE) with the explicit form ( Note that this model only contains effective parameters that were rescaled by the inertia M .For example, the actual primary control strength is M /τ [Eq.( 3)].However, the whole model is invariant under a scaling of M (cf.Appendix A 2) such that it is only possible to estimate the ratio of parameters and the inertia.Applying Itô's calculus [36], the SDE can be recast into a Fokker-Planck equation (FPE) of the probability density function P(θ, ω; t): As we show in Appendix A 3, the FPE is solved by a multivariate Gaussian distribution with x = (θ , ω) and time-dependent parameters , if the parameters satisfy the ordinary differential equations ) ) These result from a different time evolution of generation G(t) and load L(t) due to the market-based dispatch of power generation, which is illustrated in this panel using self-engineered synthetic data.(c) We approximate the deterministic mismatch with a sawtooth function P(t).(d) The full probabilistic model P(ω, t) incorporates deterministic imbalances P im (t), additional stochastic imbalance fluctuations P noise (t), and the load-frequency control P control (t).The parameter model F NN predicts the model parameters, i.e., the imbalance and control parameters (color highlight), as well as the initial covariances, from N techno-economic features by using a feed-forward neural network (FFNN).The feature sets differ in the day-ahead and the ex post models, which are two different model types defined in Sec.II C.
Here, μ and σ are the mean and standard deviation of the angle and the frequency deviation, while σ θω represents their covariance.
In Appendix A 5, we provide a semianalytical solution for the moment equations (8), which enables a numerically tractable calculation of the likelihood P(ω; t).

B. Power system operation and interdependencies
The SDE for the frequency dynamics in Eq. ( 5) contains several parameters, describing the load-frequency control system (τ , κ) or the power imbalances (r, q 1,...,4 , D).The parameters are not constant, but change in time due to different operating conditions of the power system.
For instance, the market-based scheduling of dispatchable power plants causes deterministic power imbalances [34,37], which we represent in our model via the imbalance parameters q 1,...,4 and r [Figs.1(b) and 1(c)].As the load changes throughout the day, the imbalance parameters will also vary over time.Another example is the (effective) primary control time scale τ , which can change in time due to a variation of inertia M caused by a varying share of conventional power plants (cf.Sec.II A).In general, the operating conditions mainly change due to a different dispatch on electricity markets.The largest volume of electricity in Europe is still traded in hourly blocks [38].Therefore, we assume that our model parameters change on time scales of 1 h or longer, but are approximately constant within each hour.
We thus propose a model that links different temporal and technological scales, integrating the fast dynamics of the frequency-control system, the stochastic noise, and the impact of slowly changing operating conditions.In every hourly interval, the frequency is modeled by SDE (5).The system parameters τ , κ, D, r, q 1,...,4 are constant within the hour, but change from interval to interval in a stepwise manner.The operating conditions are characterized with different techno-economic features, as detailed below, and their impact on the model parameters is predicted with an FFNN that is trained such that the stochastic dynamics best fits the recorded time series.More precisely, the FFNN constitutes a parameter model F NN : X → ϑ, where X summarizes the values of techno-economic features [Fig.1(d)].Vector ϑ includes the system parameters τ , κ, D, r, q 1,...,4 as well as the initial covariances σ ωθ,0 , σ 2 ω,0 , and σ 2 θ,0 at time t = t i , while the initial means μ ω,0 and μ θ,0 are directly obtained from the data.Integrating the parameter model and the stochastic dynamical model, we obtained a PIML model that predicts a probability distribution P(ω; t) for 1 h from techno-economic features.

C. Techno-economic features affecting power system dynamics and operation
As input features for the PIML model, we used several operational time series of the continental European power system from Ref. [23].The data set includes 64 techno-economic features with an hourly resolution, which are based on data from the ENTSO-E Transparency Platform [18].
We created two different PIML models using subsets with N features from the original dataset.The day-ahead model offers a probabilistic day-ahead forecast of the grid frequency (Sec.III A).It uses only day-ahead available features (N = 13): day-ahead forecasts of load and renewable generation, day-ahead electricity prices and scheduled generation, as well as their respective ramps (time derivative of a feature) and the hour of the day (cf.Table I).
In contrast, the ex post model uses only ex post available features (N = 51): actual load and generation per type, their ramps and forecast errors (day-ahead minus actual), as well as day-ahead electricity prices (to keep a price feature in the input set).As these features describe the actual system operation, we used the ex post model for parameter inference (Sec.III B) and the generation of synthetic time series (Sec.III C), but we also compared it to the day-ahead forecasting model in Sec.III A. A full list of features is provided in Table I.Details on the preprocessing for the continental European power system are provided in Ref. [23].Notably, the number of available features can vary in other power systems, as, for example, certain generation types might not be installed.
The grid frequency recordings used in this work were taken from Ref. [39], which provides preprocessed frequency data from the German transmission system operator TransnetBW [40].

D. Artificial neural network model
The integrated model provides a probabilistic prediction of the power grid frequency for every hourly interval [t i , t i+1 ].The architecture of the model is depicted in Fig. 1(d).
As input, we used the techno-economic features where N depends on the model type (cf.Sec.II C).In the first step, each feature X k was normalized using functions η k (X k ) = (X k − X k )/σ k to improve numeric stability, where • denotes the average and σ k the standard deviation of the feature.
The normalized features were fed into an FFNN of N h hidden layers with N u units and activation functions ϕ.The last layer comprises a linear activation λ, as we aim to predict real-valued parameters ϑ.
The following layer rescales the output of the FFNN and implements several constraints.The rescaling was implemented to improve training efficiency and stability.After random initialization, the outputs u j of the FFNN typically have the same scale, but the physical parameters do not.Such a mismatch will yield large initial errors along certain parameter axes, leading to inhomogeneous loss landscapes that can make optimization inefficient and more difficult [41].This difficulty can be mitigated by a suitable rescaling.Furthermore, several output variables must respect physical constraints.For instance, the time constants τ and κ must be positive and respect the inequality κ ≥ 2τ to avoid an unphysical oscillation behavior of the solution.Rescaling and constraints were implemented with parameter-specific functions ν j (u j ), which are described in Appendix B.
After rescaling, the output ϑ(i) of the parameter model F NN was used to compute a probabilistic prediction of the grid frequency for the entire time interval [t i , t i+1 ] based on Eq. (7).Vector ϑ(i) contains the system parameters as well as the covariances at t = t i , while the means μ ω,0 and μ θ ,0 are directly taken from data.For training and forecasting applications, we used the actual value of the frequency μ ω,0 (i) = ω(t i ) and estimated μ θ ,0 (i) = t i t i −60 s ω(t ) dt .For the generation of synthetic time series, we predicted intervals sequentially in time and estimated μ ω,0 and μ θ ,0 from the preceding prediction and not from the data.

E. Training, testing, and interpretation
Our complete data set comprises 26 859 data points from 2015 to 2019.Each data point corresponds to one interval [t i , t i+1 ] and comprises a feature vector X(i) and a TABLE I. External features used in the ex post and day-ahead models.The data are available online [33], data acquisition and preprocessing are described in Ref. [23].

Ex post
Generation We quantified the ability of the model to predict the stochastic frequency dynamics by the negative loglikelihood.The model performance can vary with the prediction length t max ≤ t in each interval.Therefore, we define the loglikelihood for subsets I = [t i , t i + t max ]: with P(ω; t|ϑ) the marginal of PDF (7) evaluated at the measured data ω(t).During the training process, we utilized the full intervals with t max = t.During model evaluation, we also investigated the performance for different prediction lengths t max .Notably, the loglikelihood is a negatively oriented metric, i.e., smaller values represent a better performance.
To train the PIML model, we initialized the FFNN weights using the Glorot uniform initializer [42].Using data from 2015 to 2017, we trained the weights with stochastic gradient descent using the ADAM optimizer with a fixed learning rate [41].As a loss function, we chose the negative loglikelihood (9), summed over all hourly intervals in the training set.The model hyperparameters were optimized using random search on data from 2018 (as a validation set) and with parameter choices defined in Table II.In particular, we trained the model for 100 epochs and applied early stopping based on the validation loss.
Then, we retrained the best model on data from 2015 to 2018 and evaluated the performance in terms of the negative loglikelihood on data from 2019 as a test set.
We benchmarked the developed model by comparing its performance to the daily profile of the grid frequency, which is defined as follows.For a fixed time of the day t d , we collected all frequency values recorded on all days in the training set and calculated their average μ p (t d ) and the corresponding standard deviation σ p (t d ).Our daily profile model P p returns a normal distribution P p (ω; t) = N (μ p (t d ), σ p (t d )) based on the time of the day t d (t) of time step t.For example, the predicted mean μ p (t d ) for January 11, 2019 at 11:00 equals the average of frequency values at 11:00 over all days in the training set.In addition to the daily profile, we applied the constant model as a benchmark, which simply provides a normal distribution using  the global mean and variance of the whole frequency time series.Finally, we interpreted our parameter model F NN : X → ϑ with Shapely additive explanation (SHAP) values [31], which attribute the prediction of a single parameter ϑ j (i) to the impact of different features X k (i).Aggregating individual SHAP values offers a tool to inspect feature importance and dependencies extracted by the FFNN.In particular, we used KernelSHAP [43], which approximates SHAP values for any machine learning model.

III. MODEL APPLICATION AND EVALUATION
We demonstrate and evaluate three applications of our PIML model.First, it provides a probabilistic prediction of the grid frequency trajectory in each time interval, which we evaluate in terms of the performance and compare it to elementary benchmarks (Sec.III A).Second, the model infers time-dependent imbalance and control parameters based on the data, i.e., we can use it for system identification.We analyze their time dependence, compare our estimates with values from the literature, and explain their dependency on techno-economic features with SHAP values (Sec.III B).Third, our model provides a tool for generating synthetic frequency time series by drawing samples from the stochastic process.Such synthetic time series should reproduce central stochastic characteristics of the grid frequency, which we evaluate in Sec.III C.

A. Probabilistic prediction of the grid frequency
Our physics-informed model provides a probabilistic prediction for unseen samples of the grid frequency.Its performance depends both on the length of the prediction and on the available set of features (Fig. 2).III).In addition to the standard scaling used in the main model and a model with no scaling, we tested small variations of the scaling parameters (defined in Table IV) with ten random weight initializations for each combination of scaling parameters.The ensemble means (markers) and the data range (error bar) indicate the relation between the inferred parameters and the reference, which showed particularly large deviations if no scaling is applied.
The machine-learning model outperforms elementary benchmarks on a prediction horizon of 15 min [Fig.2(a)].The ex post model, defined in Sec.II C, yielded a lower median loss than the daily profile and the constant model for predicting the first t max = 15 min of each interval.Restricting the feature set to day-ahead available data (cf.Sec.II C) yielded a similar performance, which enables us to forecast future frequency deviations better than the daily profile.With growing prediction length t max , the performance increase of our PIML models (relative to the daily profile) deteriorated [Fig.2(b)].This is probably due to an accumulation of errors as the model starts to deviate from the initial condition, which was fixed by the data.
The prediction examples in Figs.2(c)-2(f) illustrate the strengths and limitations of our model.The intervals with the best model performance at 01:00 and 18:00 demonstrate how our model outperforms the daily profile in the first 15 min [Fig.2(c)] or even the full hour [Fig.2(e)].A remarkable aspect is observed when inspecting the intervals with the worst performance: as the physics-informed model fails to capture the dynamics, so does the daily profile, and both models offer a similar prediction [Figs.2(d) and 2(f)].In these cases, our PIML model reproduced a daily average behavior, possibly missing enough detailed information in the techno-economic feature set.
In the following sections, we explain dynamical parameters based on techno-economic features, among others.The ex post model better suits for explanation as it also includes actually measured features, such as forecast errors.Therefore, we only focus on the ex post model and predict the full interval (with t max = 1 h) in following sections.

Inference and variation of system parameters
In addition to probabilistic prediction, our PIML model provides a tool to infer dynamical system parameters from frequency measurements and techno-economic features (Fig. 3).In contrast to time-independent models [14,47], our parameter model F NN extracts time-dependent system parameters ϑ(i), which mirror the local dynamical TABLE III.Reference values of the system parameters ϑ ref j obtained in a time-independent model [14].We note that this study employed the actual grid frequency instead of the angular velocity.Thus, we rescaled the results according to D → 2π D, q 1 → 2π q 1 , r → 2π r, θ → 2πθ, while τ and κ stayed the same.Furthermore, this study did not directly provide a result for r, but its value was implicitly defined through the constraint P im (t) t = 0, which yields r = −2q/t max .
Parameter Value τ 120 s κ 183 s D 0.007 s −3/2 q 1 0.01 s −2 r 0.000 009 s −3 properties of load-frequency control (cf.Fig. 1).These parameters change in a stepwise manner for each hourly interval i (cf.Sec.II B).Note that we only estimated effective parameters that also contain the impact of the inertia (cf.Sec.II), which we discuss later in Sec.IV.When it comes to the deterministic imbalance parameters q 1,...,4 , we only discuss the results for q 1 , the power step at the beginning of the hour as an application example.
The inferred parameters strongly changed during the day, which illustrates the importance of time-dependent dynamical modeling [Figs.3(a)-3(e)].The daily profile of the primary control time scale showed variations of 37%, while the intrinsic secondary control time scale varied by 11%.The strength of high-frequency imbalance fluctuations D varied by 6% and the deterministic imbalance parameters q 1 and r changed by 80% and 120%.
The imbalance parameters showed distinct patterns that reveal physically meaningful impact factors on the grid frequency.We inferred upward power steps q 1 and negative drifts r in the morning around 06:00 and in the evening around 18:00, while the opposite behavior was estimated around noon and during the night.This successfully models the deterministic imbalances between scheduled generation and continuous load: around noon and during the night, the load is decreasing, thus causing downward power steps and positive drifts [cf.Figs.1(b) and 1(c)].
The inferred parameters agree well with estimates from the literature.Typical values of the system parameters were obtained in a time-independent model using the Kramer-Moyal expansion by Rydin Gorjão et al. [14] and are summarized in Table III.Figure 3(f) depicts the ratio between the time average of our absolute parameter estimates and the reference value from the literature ϑ ref j .Our model (referred to as "standard scaling") infers parameter values within the same order of magnitude as the reference values, i.e., the ratio to the reference is near one.
The adequate inference of dynamical parameters is greatly facilitated by our implementation of appropriate scaling steps.In our PIML model, the last model layer rescales the inferred parameter to a similar order of magnitude (see, Sec.II D).In particular, we used the scaling constants defined in Appendix B throughout the paper (referred to as "standard scaling").To assess the impact of the rescaling, we defined two other scaling configurations.If no scaling is applied (triangle markers), the estimated parameters deviated from the reference by multiple orders of magnitude.However, small variations of our standard scaling coefficients, which are defined in Appendix B, did not strongly change the parameter estimates (square markers).This indicates that scaling is important to obtain reasonable parameter estimates with fixed hyperparameters (number of epochs, learning rate, etc.), but the results seem to be independent of the exact choice of the scaling.As already discussed in Sec.II D, the rescaling probably leads to better results due to the large difference in scale between the model parameters, which renders the FFNN training inefficient and unstable [41].

Techno-economic drivers of dynamical system properties
Using SHAP values [31], we explain the dependencies between the techno-economic features and the dynamical parameters ϑ j , which the model extracted from the data (Fig. 4).Focusing on the deterministic mismatch parameters q 1 and r and the primary control time scale τ , we analyze feature importance quantified by the mean absolute SHAP value [Fig.4 The power step q 1 was mostly determined by generation and load ramps [Figs.4(a) and 4(b)].The load ramp strongly correlates with the direction of scheduled generation steps as electricity markets aim at covering the load.It thus had a positive impact on the step q 1 .In addition, fast generation types are important features.It is known that especially fast generation ramps drive the power step and thus the rate of change of frequency (RoCoF) at the beginning of the market intervals [23].Accordingly, the PIML model yielded a high positive impact of hydro power ramps, which are among the fastest in the European power system.
Drift r of the deterministic mismatch mirrors continuous changes of the load [Figs.1(b) and 1(c)].Consistently, load ramps obtained the highest feature importance for r with positive load ramps leading to negative slopes r [Figs.4(a) and 4(c)].Solar ramps were also ranked highly, but their dependency showed the opposite behavior.This mirrors the fact that in addition to the load, solar power also shapes the slow evolution of the deterministic mismatch P im (t) ∼ rt [37]: the load and aggregated solar power typically change slowly and continuously on a time scale of hours, with the load having a negative impact and the solar power having a positive impact on the power imbalance.This perfectly manifests in the opposite effects of load and solar ramps on the mismatch slope r, which were identified by our PIML model [Fig.4(c)].
The effective time scale of primary control τ was determined by load and generation ramps, wind power, and lignite power generation [Figs.4(a) and 4(d)].Most interestingly, increasing wind power generation led to a larger time scale of primary control.In an Ornstein-Uhlenbeck process, τ quantifies the time to revert back to the mean after a disturbance.A large wind power feed-in can cause large stochastic imbalances and thus effectively reduce the mean-reverting time τ .This would be consistent with our SHAP results and with previous studies that showed an increased variability of short-term frequency dynamics with increasing wind power feed-in [48].Notably, this dependency cannot be caused by the rescaling of dynamical parameters with the inertia.If the actual primary control strength M /τ was constant (cf.Sec.II), increasing wind power, and thus decreasing inertia M , would correspond to decreasing values of τ .However, we observed the opposite: the dependency showed increasing values of τ [Fig.4(d)], thus pointing to other causes such as an increased variability.

C. Generation of synthetic grid frequency time series
A third major application of probabilistic machine learning models is the generation of synthetic time series.Scenario generation, i.e., the generation of multiple synthetic samples from the model, is important for simulation or optimization models [49,50].Given a data set of technoeconomic features, synthetic time series are obtained as follows.For every interval i, we applied the FFNN to predict the system parameters ϑ(i).We then integrated the original SDE (5) using a standard Euler-Maruyama method.To ensure continuity, we used the final values of ω and θ from one interval as initial states for the following interval.As a test case, we generated a synthetic trajectory from April 06 to April 13 in our test set, for which the first hours are shown in Fig. 5(a).
Power grid frequency trajectories exhibit several highly characteristic stochastic properties [25]: the distribution of frequency deviations ω(t) is heavy tailed [Fig.5(b)], i.e., large deviations are much more likely than expected from conventional normal statistics.Furthermore, the autocorrelation function peaks at multiples of a quarter hour, the smallest interval of electricity trading in Europe, which are most strongly pronounced after one hour [Fig.5(d)].All these characteristic patterns were well reproduced by our PIML model.
A discrepancy of the data and model is observed in the statistics of the frequency increments ω T (t) = ω(t + T) − ω(t) for T = 1 s, as shown in Fig. 6.The increments are correlated for time scales of up to 8 s, which indicates correlated power fluctuations P noise (t).In contrast, the PIML model assumes Gaussian white noise [cf.Eq. ( 4)], and thus cannot reproduce the correlation of the increments for T = 1 s.At the same time, the PIML model strongly overestimates the variance of the increments.Increments with a longer time lag of T = 60 s are much better reproduced though.
We conclude that the assumption of uncorrelated power fluctuations P noise (t) is not fully justified.The model compensates for the missing correlations by increasing the noise strength, such that the frequency diffusion on longer timescales fits the data.We note, however, that the assumption of uncorrelated fluctuations is needed to derive the Fokker-Planck equation and its analytic solution.Going beyond this approximation would render model training close to impossible.
All in all, our results reveal important aspects of loadfrequency dynamics and control.The success of our model suggests that the non-normal statistics is a direct consequence of the nonautonomous character of the power system.We recall that a time-invariant linear Gaussian process always leads to a Gaussian distribution of the observables.The changing system parameters in the power grid naturally induce heavy tails in the frequency distribution, without the need for heavy-tailed power fluctuations or nonlinear evolution equations; cf. the discussion in Ref. [15,25].

IV. DISCUSSION
We have developed a model of power system operation that integrates both the internal system dynamics and the external techno-economic features.The integration has been achieved by the combination of an explicit simulation model in terms of stochastic differential equations, and an artificial neural network to link the external influences to the system parameters.We thus obtained a generic physics-informed machine learning model of power system dynamics and control.
Using grid frequency recordings from the continental European power grid as a test case, we demonstrated three applications of our physics-informed model.First, we provided a probabilistic prediction of the grid frequency in intervals of 1 h.In the first 15 min, our model outperformed the daily average profile of the grid frequency, which is already a good predictor in continental Europe [23].This was also possible when using only day-ahead available techno-economic features, thus providing the possibility to forecast grid frequency dynamics 15 min ahead.Previous grid frequency predictors only used historic frequency data as inputs [51][52][53].Approaches that integrated external features previously focused on aggregated frequency deviations [23,54], which is also a cause of the data quality.Techno-economic features are typically available only on aggregated time scales of 15 min or 1 h, while the frequency fluctuates on a time scale of seconds (and even shorter time scales).We bridge the gap between largescale techno-economic features and short-term frequency dynamics by using a physics-informed model.It connects the time-aggregated features with dynamical parameters of a stochastic process that well describes the short-term grid frequency fluctuations.Second, our model provides a tool for system identification and explanation.The model inferred the effective system parameters for every hour from frequency measurements and techno-economic input features.The parameters were rescaled by the inertia (cf.Sec.II), but the actual system parameters can be obtained by incorporating inertia time series, which however can only be approximated for large-scale power systems [21,55].The inferred parameters varied strongly in time, which indicates the importance of modeling the grid frequency as a nonautonomous system with time-dependent parameters.Explaining the inferred parameters with SHAP values further revealed their dependency on techno-economic drivers.For example, the primary control time scale increased with rising feed-in of wind power, which is harder to control and thus effectively leads to longer relaxation times.Our tool therefore extracts and explains physically meaningful system parameters and their time-dependent drivers.
Third, we used our model for the generation of synthetic grid frequency time series.The synthetic data well approximated the heavy-tailed distribution of frequency deviations and the recurrent patterns in its autocorrelation.In contrast to previous stochastic models [13][14][15], the synthetic time series also reproduced the actual frequency trajectory with its local time-dependent characteristics.Notably, the model requires very little system-specific information as inputs, but learns them directly from the data.Hence, the model is highly flexible and can be easily transferred to other grids.
A key feature of the load-frequency dynamics is the presence of three distinct time scales.The power balance fluctuates on time scales of seconds and below, which was represented by the term P noise (t) in the model.The data suggest that fluctuations are correlated on time scales of a few seconds, but these correlations had to be approximated to keep the model tractable.Load-frequency control acts on the time scale of minutes to restore the power balance and bring the frequency back to its reference value.This dynamics has been modeled by stochastic differential equations, where the high-frequency fluctuations enter as noise.
The power system further evolves on time scales of hours due to changes in the dispatch of power plants, the trading on electricity markets, and the demand.This evolution is included in the model via changing system parameters, including the droop constants of the control system and the systematic power balance P im .This evolution has been modeled by a neural network in terms of large-scale energy system features.
Notably, the load-frequency control system itself comprises different processes with different time constants.The inertia of the synchronous machines determines the response to a perturbation or a change in the dispatch.Though we cannot infer the numerical value of the inertia, the response is very well reproduced by the model.This "momentary reserve" is the fastest contribution to the load-frequency control system, and its role in future renewable power systems is an important topic in power engineering [12].Primary control restores the power balance, while secondary control brings the grid frequency back to its reference value.We assumed that primary control is faster than secondary control (τ < κ) to ensure an overdamped dynamics.The numerical results yield values around τ ≈ 35 s and κ ≈ 145 s, which consistently satisfy our assumptions and agree with reference values [14].
In the context of power system dynamics and control, physics-informed machine learning methods have become popular during the past years [56,57].Classical physics-informed neural networks (PINNs) are commonly applied to the differential equations directly [58], which we circumvented by providing a semianalytical solution for our system.However, a (semi)analytical solution is not possible anymore when including nonlinearities such as deadbands.In the future, our model can be modified to leverage classical PINNs to also treat nonlinearities and more generic power system dynamics.Previous applications of PINNs to power system dynamics have successfully addressed autonomous dynamics [57,59].We contribute to these developments by proposing a model that explicitly models nonautonomous dynamics, which may greatly advance the application of physics-informed machine learning in the energy sector.
In terms of the characteristic function, the FPE reads where we have defined u = (r, s).The characteristic function of the normal distribution (A16) reads We now show that the normal distribution (A20) with parameters (A17) satisfies the Fokker-Planck equation (A19).We first evaluate the right-hand side of the FPE, Now we proceed with the left-hand side, Inserting Eqs.(A17) then yields which coincides with the right-hand side (A21).

Moment equations
The ordinary differential equations for parameters (A17) can also be obtained in a more direct way, once we know that the PDF remains Gaussian at all times.In fact, we can exploit the fact that the parameters of a Gaussian PDF equal the mean and the (co)variances.The dynamics of the mean and the (co)variances are determined by the moment equations, which we extracted using Itô's lemma.For any twice differentiable scalar function g(X) of the random variable X in Eq. (A14), Itô's lemma reads [36] where ∇g is the gradient and H g is the Hessian matrix of function g(X).This yielded in our case To apply this to moment functions, we further assumed that d g = dg .For the first moments (averages) g = θ and g = ω , we obtained The second moments g = θ 2 , g = ω 2 and the mixed moment g = ωθ yielded In this derivation, we used ω dW = 0 and θdW = 0. Identifying and σ θ ,ω = ωθ − ω θ then reproduces Eqs.(A17).

Solution of the moment equations
We now provide a semianalytic solution for the ordinary differential equations (A17) describing the evolution of the parameters μ θ , μ ω , σ 2 ω , σ 2 θ , and σ θ ,ω .We first note that the equations for the deterministic part (the means) and the stochastic part [the (co)variances] decouple; hence, they the sum of the homogeneous solution and an inhomogeneous contribution y in (t): We provide a semianalytical solution, which leaves the integration of inhomogeneity b d (t) to a numerical routine.This enables us to flexibly insert a different power function P(t).The deterministic part μ ω,in (t) contains an integral over the deterministic power imbalance P(t), which we can compute numerically for any power function.However, the factors e −λt in Eq. (A51) can become very large as λ < 0 for stable systems, thus causing numerical problems.
To use this semianalytical solution during neural network training, one has to mitigate these numerical problems.We did this by restricting the parameter space to ensure that λ does not become too large (Appendix B).Note that we only require the marginal probability density P(ω; t) = P(θ, ω; t) dθ to model the grid frequency dynamics; hence, we only needed a closed-form solution for μ ω and σ ω .

APPENDIX B: PARAMETER SCALING AND CONSTRAINTS
The developed PIML model includes a layer that rescales the parameters and ensures some physical constrains [Fig.1(d)].The outputs u j of the FFNN do not necessarily fulfil the physical constraints of parameters ϑ j (cf.Table IV), as the linear activation of the output takes arbitrary real values, while τ , for example, only takes positive real values.Moreover, the physical parameters ϑ j vary strongly in scale (cf.Table III), but outputs u j of the initialized FFNN typically have the same scale due to uniform random initialization of the weights [65].This will yield large initial errors along certain parameter axes, thus leading to inhomogeneous loss landscapes that can make optimization inefficient and more difficult [41].
Therefore, we added a constraint and scaling layer that applies functions ν j to the FFNN outputs.The results then represent the parameter estimates ϑ j = ν j (u j ).First, functions ν j enforce the physical constraints.For example, a softplus function S + (u) = log(exp(u) + 1) ∈ (0, ∞) enforces positivity, and the sigmoid function Sig(u) = (1 + exp(−u)) −1 ∈ (0, 1) was used to ensure that τ ≤ κ/2 holds.Numerical imprecision can lead to a violation of these constraints, so we added a safety factor δ = 0.999 in some cases.Second, factors s j , which mirror the typical scale of parameters ϑ j , are applied to make the optimization more efficient.Third, minimum values are added in certain cases to ensure numerical stability during optimization.For example, a very small standard deviation σ ω,0 can lead to probability densities beyond float precision.All in all, we defined the following functions using minimum values, scaling factors, and constraints from Table IV: To test the impact of the scaling with s j , we varied the scaling parameters according to Table IV.In particular, we trained the PIML model for each combination of the individual scaling choices listed in the table.For each scaling tuple, we additionally simulated ten different random initializations of the FFNN weights.Finally, we trained ten initializations using the standard scaling defined in Table IV and no scaling with s j = 1 (cf.Sec.III B).
TABLE IV.Properties of dynamical system parameters that are predicted by the FFNN [cf.Fig. 1(d)].The parameters are subject to several physical constraints, which are summarized in the third row.The output of the FFNN is rescaled by constant factors listed in the fourth row to improve training efficiency (referred to as the standard scaling).To test the impact of the scaling, we varied scaling s j according to the parameter choices in the fifth row.Furthermore, we ensure that the dynamical system parameter exceeds a minimum value listed in the sixth row.

FIG. 1 .
FIG.1.A physics-informed machine learning model for the power grid frequency.(a) In each interval [t i , t i+1 ], we modeled the grid frequency with a stochastic process that provides a normal distribution with time-dependent mean μ ω (t) and standard deviation σ ω (t).The panel depicts our model prediction of the continental European grid frequency at t i = 22 : 00 on October 11, 2018 in comparison to the recorded data.(b) Deviations from the reference frequency are partly driven by deterministic power imbalances P im (t) = MP(t).These result from a different time evolution of generation G(t) and load L(t) due to the market-based dispatch of power generation, which is illustrated in this panel using self-engineered synthetic data.(c) We approximate the deterministic mismatch with a sawtooth function P(t).(d) The full probabilistic model P(ω, t) incorporates deterministic imbalances P im (t), additional stochastic imbalance fluctuations P noise (t), and the load-frequency control P control (t).The parameter model F NN predicts the model parameters, i.e., the imbalance and control parameters (color highlight), as well as the initial covariances, from N techno-economic features by using a feed-forward neural network (FFNN).The feature sets differ in the day-ahead and the ex post models, which are two different model types defined in Sec.II C.

FIG. 2 .
FIG. 2. Probabilistic machine learning model outperforms elementary benchmarks on a prediction horizon of 15 min.(a) We quantified the performance of our probabilistic frequency prediction with the negative loglikelihood loss, which implicates better performance if values are lower.For a prediction length of t max = 15 min, the day-ahead and ex post models showed better performance than our benchmarks, namely, the constant model and the daily profile predictor.(b) The performance increase of our models relative to the daily profile shrank with growing prediction length t max .The dots indicate the median and the bars represent the 25% and 75% quantiles.(c)-(f) Prediction examples with best and worst performance at 01:00 and 18:00 illustrate the strengths and limitations of our model.

FIG. 3 .
FIG.3.Physics-informed machine learning model infers dynamical system parameters.(a)-(e) The inferred parameters showed strong daily patterns, which underlines the importance of time-dependent dynamical properties in our model.The panels depict the daily means (solid lines) and the range between the 25% and 75% quantile (area).(f) The ratio between the average inferred parameters and the reference value ϑ ref j obtained from time-independent stochastic inference (cf.TableIII).In addition to the standard scaling used in the main model and a model with no scaling, we tested small variations of the scaling parameters (defined in TableIV) with ten random weight initializations for each combination of scaling parameters.The ensemble means (markers) and the data range (error bar) indicate the relation between the inferred parameters and the reference, which showed particularly large deviations if no scaling is applied.

FS
FIG. 4. SHAP values reveal dependencies between techno-economic features and dynamical system parameters.(a) The mean absolute SHAP values quantify the overall importance of a feature, which varies strongly among the different dynamical system parameters.We show the eight most important features for each model parameter.(b)-(d) The relation between feature values and SHAP values reveals the dependencies that were extracted by our model.The inferred dependencies generally represent physically meaningful effects that agree with domain knowledge (see the main text).

FIG. 5 .FIG. 6 .
FIG. 5. Physics-informed machine learning model generates synthetic grid frequency time series with characteristic properties.(a)We sampled a synthetic frequency trajectory from our probabilistic model for the test period April 06 to April 13 in our test set, which was the longest period without missing or corrupted data points.The panel depicts the start of this period, showing good agreement between real and synthetic time series.(b),(c) The autocorrelation function (ACF) and the probability density function (PDF) of the frequency deviation ω(t) are well reproduced by our synthetic time series.