Machine learning for excitation energy transfer dynamics

A well-known approach to describe the dynamics of an open quantum system is to compute the master equation evolving the reduced density matrix of the system. This approach plays an important role in describing excitation transfer through photosynthetic light harvesting complexes (LHCs). The hierarchical equations of motion (HEOM) was adapted by Ishizaki and Fleming (J. Chem. Phys., 2009) to simulate open quantum dynamics in the biological regime. We generate a set of time dependent observables that depict the coherent propagation of electronic excitations through the LHCs by solving the HEOM. We solve the inverse problem using classical machine learning (ML) models as this is a computationally intractable problem. The objective here is to determine whether a trained ML model can perform Hamiltonian tomography by using the time dependence of the observables as inputs. We demonstrate the capability of convolutional neural networks to tackle this research problem. The models developed here can predict Hamiltonian parameters such as excited state energies and inter-site couplings of a system up to 99.28\% accuracy and mean-squared error as low as 0.65.


I. INTRODUCTION
During the first step of photosynthesis in light harvesting complexes, photons are absorbed by the antenna. While part of the energy of the photons is converted into heat in the form of molecular vibrations, most of the energy is captured as excitons which are subsequently transferred via chromophores to the reaction centre through a process labeled as excitation energy transfer (EET). It is at the reaction centre where photochemical reactions are triggered [1].
The evidence for quantum coherence, which has no classical analogue, in the exciton transport process became undeniable in 2007 when Engel et al. [2] used twodimensional spectroscopic signatures [3] to demonstrate quantum 'beating' within a photosynthetic complex at 77K, a result that was later confirmed at room temperature. Quantum beating in spectroscopic measurements provides a direct measure of quantum coherence on the appropriate energy and time scales [4]. While electronic coherence was first proposed as the source for the observed long-lived quantum coherence [2], experimental and theoretical evidence has also supported proposals that the phenomenon resulted from a mixture of electronic and vibrational states which is referred to as 'vibronic' coherence [5].
The idea of quantum coherence playing a role in photosynthesis arose from observations that some energy or electron transfer processes in bacterial and plant pigment-protein complexes are efficient to an extent that exceeds explanation using only classical theory. Engel et al. [2] investigated photosynthetic EET in the Fenna-Matthews-Olson (FMO) protein of green sulfur bacteria [1,6]. The FMO complex was the first * kimaranaicker@gmail.com chlorophyll-containing protein that was crystallized [7]. Prior to mass spectrometry measurements of the protein that confirmed the existence of an eighth pigment, it was accepted that the protein consisted of seven pigments [8]. Subsequently, FMO is made up of three subunits each consisting of eight bacteriochlorophyll molecules. The FMO complex serves as a bridging energy wire as it is tasked with transporting energy in the form of light harvested in the antenna chlorosome to the reaction centre pigments [1]. It represents an important model in EET and has been extensively studied experimentally and theoretically. Engel and collaborators succeeded in observing long-lasting quantum effects providing direct evidence for long-lived electronic coherence [4]. The observed coherence lasts for time scales similar to the EET timescales, implying that electronic excitations move coherently through the FMO protein rather than by previously proposed incoherent hopping motion [9,10]. Panitchayangkoon et al. [11] presented evidence that quantum coherence survives in FMO at physiological temperature for at least 300 fs which is long enough to impact biological energy transport. Collini et al. [12] made observations that provide evidence for quantum coherent sharing of electronic excitation across proteins under biologically relevant conditions. They suggest that distant molecules within the photosynthetic proteins are 'wired' together by quantum coherence for more efficient lightharvesting in cryptophyte marine algae. Lee et al. [13] present experimental results in which they suggest that correlated protein environments preserve electronic coherence in photosynthetic complexes and allow the excitation to move coherently in space which enables highly efficient energy harvesting and trapping in photosynthesis. However, explanations for observed long-lived coherence have evolved during the past decade [14][15][16]. More recently, Fuller et al. [5] and Thyrhaug et al. [17] suggest that the coherences are of a mixed electronic-vibrational nature and may enhance the rate of charge separation in oxygenic photosynthesis.
Understanding the relationship between the structure of light harvesting complexes and their excitation energy transfer dynamics is of importance in many applications. Insight into long-lived quantum coherence in EET processes can be gained through the reduced equation of motion and the numerically exact formalism of quantum dynamics adopted to study EET processes is the Hierarchical Equations of Motion (HEOM) derived by Tanimura and Kubo [18].
Machine learning is a well established tool that has been actively applied in various ways to address physical problems [19]. One common strategy is to use supervised learning in which an algorithm is trained with datasets that are labeled beforehand then, the goal of the algorithm is to establish a general rule for assigning labels to data outside the training set. This approach can be used to identify distinct phases of matter and the transitions between them, one of the central problems in condensedmatter physics, that has been tackled by Carrasquilla and Melko [20]. Machine learning techniques have been used to represent and solve quantum systems such as in a work by Carleo and Troyer [21] where the authors introduce an ansatz capable of both finding the ground state and describing the unitary time evolution of complex interacting quantum systems. More specifically and in a bid to investigate open quantum systems further, the applications of supervised machine learning that we are interested in are related to forms of approximating solutions to open quantum system dynamics such as where multilayer perceptrons have been used to obtain the exciton dynamics of large photosynthetic complexes [22] and to better understand the relationship between the structure of light-harvesting systems and their excitation energy transfer properties [23]; where recurrent neural networks were used to model quantum systems interacting with an unknown environment [24] and where convolutional neural networks were used to predict long-time dynamics of an open quantum system [25].
The primary focus of this work is on using classical machine learning models to study the quantum dynamics of EET. We can generate a time dependent set of observables that depict the coherent movement of electronic excitations through a photosynthetic pigment-protein complex by solving the HEOM. Here we develop a scalable and efficient tool for the description of the dynamical properties of open quantum systems by use of a trained convolutional neural network (CNN) to solve the inverse problem. This means that the objective is to determine whether a trained CNN can accurately describe the system under study, by predicting the parameters of the system Hamiltonian such as excited state energies and inter-site couplings, when given this time dependent data of varying length (see Figure 1).
In Section II, we describe the HEOM framework, Section III is used to introduce and explain the use of machine learning in this context. In Section IV, we present to the HEOM to calculate the evolution of the reduced density matrixρ(t) to produce the time dependent observables. The 'inverse problem' addressed in this study is the process of using the observables as an input to the CNN to reproduce the system Hamiltonian.
the numerical results which show that our models can successfully reproduce the Hamiltonian of a system and that our method can be easily adapted to study more complex systems. Finally, Section V is devoted to concluding remarks.

II. HIERARCHICAL EQUATIONS OF MOTION
One of the viable approaches to explore long-lived quantum coherence and its interplay with the protein environment in EET processes is through the reduced equation of motion. In this approach, the key quantity of interest is the reduced density matrix, i.e., the partial trace of the total density matrix over the environmental degrees of freedom [26].
Typical situations in photosynthetic EET are such that the electronic coupling strengths, between chromophores and their local environment phonons, span a similar range as the reorganization energies, which characterize the time scale of the coupled phonons relaxing to their respective equilibrium states [24,27]. However, these sitedependent reorganization processes cannot be described by theories that rely on the Markov approximation as it requires the phonons to relax to their equilibrium states instantaneously, that is, the phonons are always in equilibrium even under the electron-phonon interaction.
In order to go beyond the Markov approximation, Tanimura and Kubo [18] developed a new theoretical framework, the HEOM, which can describe the site-dependent reorganization dynamics of environmental phonons. Ishizaki and Fleming [28] adapted this formalism to suit the quantum biological regime which is the form we employ. HEOM is a numerically exact method which accurately accounts for the reorganization process in which the vibrational coordinates rearrange to their new equilibrium positions upon electronic transition from the ground to the excited potential energy surface. It can describe quantum coherent wavelike motion and incoherent hopping in the same framework and reduces to the conventional Redfield [29][30][31] and Förster [32,33] theories in their respective limits of validity. In the following Subsection II A, we highlight the theory required to describe EET dynamics in a photosynthetic complex [34,35].

A. Theory
The total Hamiltonian is composed of the Hamiltonian of the system, bath and system-bath interaction, The Hamiltonian of the system refers to the electronic states of a complex containing N pigments, where j is the excited state energy of the jth site and J jk denotes the electronic coupling between the jth and kth sites.
Here we consider that each pigment is coupled to a separate bath. The bath Hamiltonian represents the environmental phonons, where p is the conjugate momentum, q is the dimensionless coordinate and ω j,k is the frequency of the jth site and kth phonon mode, respectively. The last term of Eq. (1) represents the fluctuations in the site energies caused by the phonon dynamics, where g j,k is the coupling constant between the jth site and kth phonon mode. The spectral density J j (ω) specifies the coupling of an electronic transition of the jth pigment to the environmental phonons through the reorganization energy λ j and the timescale of the phonon relaxation γ j . Here it is expressed as the Ohmic spectral density with Lorentz-Drude cut-off, J j (ω) = 2λ j γ j ω/(ω 2 + γ 2 j ). We focus on the application of this theory to EET at physiological temperatures of around 300 K. Hence, when the high-temperature condition characterized by γ j /k B T << 1 is imposed, the following hierarchically coupled equations of motion is obtained [28], In Eq. (5), n ≡ (n 1 , n 2 , . . . , n N ) for sets of non-negative integers and n j± differs from n by changing the corresponding n j to n j ± 1. Furthermore in Eq. (5), the elementσ(0, t) is identical to the reduced density operatorρ(t), while the rest are auxiliary density operators. Moreover, the Liouvillian corresponding to the Hamilto-nianĤ S is denoted byL e and the relaxation operatorŝ Φ j and andΘ j are given by Eqs. (6) and (7), Formally the hierarchy in Eq. (5) is infinite and cannot be numerically integrated. In order to make this problem tractable, the hierarchy can be terminated at a certain depth. There are several methods of doing so and in this work we have chosen the following termination condition following Ishizaki and Fleming [34]. For the integers n = (n 1 , n 2 , . . . , n N ) and for characteristic frequency ω e ofL e where Eq. (5) is replaced by

III. MACHINE LEARNING APPROACH
There exist numerous studies [22-25, 36, 37] of machine learning techniques applied to accelerate computations by many orders of magnitude at a reasonable level of accuracy. A machine learning model can be leveraged to predict the reduced density matrix of a system given the Hamiltonian of the system. In this approach, the model is trained and tested on a large and diverse enough dataset of Hamiltonians and corresponding reduced density matrices such that it may learn patterns in the data and be able to present highly accurate predictions without having any knowledge of the theory or in this case, HEOM.
However, in an experimental setting one may gather certain time-dependent observational data and subsequently need to use these findings to attain the Hamiltonian of the excitonic system under study. The use of machine learning models in this work is to act as a blackbox which one can input excitation energy transfer observations into and obtain Hamiltonian parameters from.
In the subsequent sections, Subsection III A we describe the machine learning basics required to follow the study. Thereafter, in Subsection III B we generate multiple datasets comprising of excited state population dynamics and corresponding Hamiltonian parameters and in Subsection III C we design the supervised machine learning model architecture to be used for making predictions based on the generated datasets.

A. Supervised machine learning
Machine learning algorithms are used to learn underlying patterns embedded in the data. In the realm of classical machine learning, there exist three broad types classified by the amount and type of supervision models get during training: supervised, unsupervised and reinforcement learning. In supervised learning, the training set fed to the model includes the true solutions called labels [38,39].
A neural network is a machine learning model whose structure is inspired by the networks of biological neurons found in our brains [40]. They are made up of layers of neurons which are core processing units of the network. Usually these layers contain an input layer to receive input features, an output layer to make final predictions and hidden layers which perform most of the computations done by the network. Convolutional neural networks (CNNs) [41] is a class of neural networks which emerged from the study of the brain's visual cortex [42,43]. CNNs specialize in processing data that has a grid-like topology. The human brain processes information when we see an image as each neuron works in its own restricted region of the visual field called the receptive field and is connected to other neurons in a way that covers the entire visual field. Just as each neuron responds to stimuli only in its receptive field in the biological vision system, each neuron in a CNN processes data only in its receptive field. The layers are arranged such that they detect simpler patterns first and more complex patterns deeper into the network. A rich description of how the convolution operation works and the advantages of using CNNs has been written by Goodfellow et al. [44].
In this work, machine learning has been leveraged to solve the HEOM inversely without explicitly solving the equations at all so that predictions of the parameters of the Hamiltonian of a system can be made when given time dependent observations.

B. Generating the database
To demonstrate the capabilities of our machine learning approaches for the regression task at hand, we investigate three datasets of increasing complexity that are randomly generated excitonic systems. The parameters of the Hamiltonians in these datasets are motivated by and sampled around the same order of magnitude as those that are typical of the light-harvesting pigment-protein FMO.
We impose a linear chain such that only neighbourneighbour couplings are permitted and for simplicity, we consider that the transition rates are strictly real in value. When sampling the Hamiltonian parameters we consider the excited state energy j and inter-site coupling J jk with respect to 1 . Hence, the Hamiltonian of an Nlevel system would require 2(N − 1) real parameters. In constructing the first dataset, we consider two-level excitonic systems which allow for excitation energy transfer between two excited states. In this case, there are two Hamiltonian parameters needed to describe the twolevel system as seen in Eq. (2) where N = 2. Therefore, by solving the HEOM we obtain the time evolution of an N-dimensional reduced density matrix for each sample. In a similar way, the second and third datasets consider three-level and four-level systems which require four and six Hamiltonian parameters, respectively. The fol- lowing data is captured and stored for each sample in each dataset: Hamiltonian parameters which are used as labels for the machine learning model and, the time evolution of each element of the reduced density matrix to be used as input features for the machine learning model. An example of an input feature for a two-level system can be seen in Figure 2 which depicts the time evolution of the population of site 1 which corresponds to the first element in the reduced density matrix. For each dataset, 25000 Hamiltonians were generated by sampling a uniform distribution for excited state energies and inter-site couplings within a fixed range of values around those that are typical of FMO complex shown in Table I. Only neighbour-neighbour couplings were considered to be non-zero, hence, J jk were only sampled for each site's nearest neighbours. Furthermore, the values of excited state energy j have been sampled with respect to 12400 cm −1 for all sites [23,45]. By explicitly solving the HEOM, we compute the time evolution of the reduced density matrix for each Hamiltonian in each of our datasets. As opposed to the traditional 4th-order Runge-Kutta method, the numerical propagation was done by an exponential expansion method described by Dattani et al. for 1 ps represented by 5000 time steps [14]. Each computation took approximately 1 s, 1.5 s and 2 s for a two-level, three-level and four-level system sample, respectively. For all samples in all datasets, the HEOM were truncated at a depth of 3 (refer to Eq. (8)). For all Hamiltonians we assumed identical Drude-Lorentz spectral densities describing the influence of the bath on each excited state. For all datasets, simulations were done at temperature T = 300 K, reorganization energy λ = 35 cm −1 and phonon relaxation time γ = 106.1767 cm −1 [28]. Python's main scientific libraries used are NumPy, pandas, and Matplotlib and Python frameworks for machine learning tasks are Scikit-Learn, TensorFlow and Keras.

C. Model architecture
The architecture of our CNNs are designed for supervised learning of excitation energy transfer dynamics. As mentioned in Subsection III B, the elements of the reduced density matrices obtained by solving the HEOM are the inputs known as features for the CNN. To reduce the computational cost of model training and to improve the performance of the model, a feature selection process was carried out to reduce the number of input variables. During the process of optimization of our CNNs it was found that rather than the entire reduced density matrix, only the diagonal elements and their nearest-neighbours were required as input features to allow the models to perform best. Data scaling was performed to transform all datasets as a pre-processing step by fitting a scaling object only to the training data then using it to tranform the training/ validation and test sets. All features of all datasets were normalized by rescaling the data into the range [0,1]. Similarly, the labels were transformed so as to be normally distributed such that the mean of the values is 0 and the standard deviation is 1. For each dataset, all features were reshaped into 4-dimensional tensors (number of samples, number of time steps, number of features, 1) and provided as input features to the CNNs, which were used to predict excited state energies and inter-site couplings known as labels. Since the input features of neural networks need to be of fixed size, we construct separate CNNs for each dataset in order to treat the different dimensionalities of the Hamiltonians. These CNNs only differ by their input and output shapes. The hidden layers of the CNNs consist of two 2D convolutional layers with by a max pooling operation layer, the previous three layers are repeated and then followed by a flattening layer and three dense layers. Goodfellow et al. [44] describe general guidelines for choosing which architectures to use in which circumstances. The full architecture of the models with hyperparameters can be found in Section VI.
The 25000 Hamiltonians of each dataset were split into three sets: a training set of 85% of all Hamiltonians where 80% of these are used for training CNN model instances with particular hyperparameters and the other 20% form a validation set used to evaluate the CNN architecture during optimization of the hyperparameters and a test set of 15% of all Hamiltonians to probe out-of-sample prediction accuracies. Noteworthy, after splitting the data into train, validation and test sets, the distribution of the Hamiltonian parameters (labels) of each of the subsets maintained their uniform distributions. All constructed CNN models were trained with 100 data points per batch and the ADAM optimizer with a learning rate of 0.001 until the mean squared error (MSE) on the validation set increased over three full consecutive training epochs on a computational cluster. The MSE provides a direct quantitative check of how close estimates or forecasts are to actual values by taking the difference between them and squaring them. This measure works well in ensuring that our trained model has no outlier predictions with large errors since it allocates larger weight to theses errors due to the squaring part of the function. Although the MSE was used to evaluate the cost function throughout the training and testing processes, the coefficient of determination, R 2 = 1 − j (yi−y) 2 j (yi−fi) 2 , was calculated as an alternate accuracy measure. R 2 depends on the ratio of total deviation of results described by the model; it can be interpreted as the percentage of variation in the dependent output attribute that the model is capable of explaining. Both of these measures were used in analyzing the results in order to determine the predictive power of the model. All CNN models were generated and trained using the Tensorflow package.

IV. RESULTS
In this section, we demonstrate the capabilities of our trained CNN models by analyzing the MSE between predicted Hamiltonian parameters and those used in the numerically exact HEOM calculations and, the coefficient of determination (R 2 score). Our trained models predict Hamiltonian parameters for test (out-of-sample) data at almost the same accuracy as for training and validation data on which CNN model parameters and hyperparameters were optimized. This demonstrates the ability of our models to generalize well to previously unseen data and to provide out-of-sample predictions with high accuracy. To support this conclusion, we present the results of the 5-fold cross-validation of each of our models in Table II which was carried out prior to training. A complete set of samples is randomly shuffled and split into the specified number of folds to form smaller sample groups. Cross-validation is a re-sampling procedure then used to evaluate the performance of a machine learning ansatz on each of the limited data sample sets formed. The model is then fit using the K-1 folds and validated using the remaining Kth fold. This process is repeated until every K-fold has served as a test set then the average of the recorded scores are captured.  Table III summarizes the results for the predicted Hamiltonian parameters for our three generated datasets where the full length of 1 ps equating to 5000 timesteps was considered for all selected input features. Further-  pending on the dataset under study. Overall we find a high accuracy of our predictions and small mean squared errors on the datasets which are in the range between 0.70 for a 2-level system and 7.25 for the largest considered 4-level system. The 4-level system dataset exhibits the most diverse transfer properties which explains the larger mean squared errors in the predictions when compared to the other datasets. Figure 3 provides a visual comparison of Hamiltonian parameters as computed with the HEOM approach and predictions with trained CNN models. In Figure 3(a), one can observe that there exist a group of points of excited state energies which are not well predicted by the model. As random Hamiltonians were generated in the data preparation step, the Hamiltonians that correspond to these points cover the full range of allowed excited state energies as seen in Figure 3(a), however, all of these points also correspond to the range of inter-site couplings that is [-30, 30] cm −1 . This is known as the overdamped regime where the time dependent observables do not display a coherent but rather a purely dissipative behaviour, hence, the CNN does not perform well in differentiating between these represented systems.
As highlighted in section III B, feature selection is critical to the performance of trained models. The time evolution of selected elements of the reduced density matrices used as input features each contain data for 1 ps long. The next objective of this work was to determine by how much this time series data could be shortened by while still maintaining the accuracy measures achieved with the full dataset. The results obtained are shown in Table IV. From this we can deduce that the full time length is not required to maintain high accuracy, rather only 400 fs are sufficient for 2, 3 and 4 level systems. The observed prediction errors are also consistent with the complexity of the system dynamics for each of the three datasets which indicates that CNN models generally benefit from a wider sampling of the input parameter space. During photosynthesis in light harvesting complexes, energy is transferred from antenna pigments to the reaction center to trigger photochemical reactions. The formalism adopted to study excitation energy transfer (EET) processes is the Hierarchical Equations of Motion (HEOM). This work focuses on leveraging classical machine learning models to study the dynamics of EET within open quantum systems. We propose the use of a trained convolutional neural network (CNN) to perform Hamiltonian tomography when given input data that represents the dynamics of EET through the open quantum system over time. We have discussed the investigation of EET for 2-, 3-and 4-level systems where linear chain configurations were imposed. The performance of the models were gauged by mean-squared error and coefficient of de-termination measures. We have proven the capabilities of CNNs and supervised machine learning as an efficient tool for solving the inverse problem of the HEOM by employing a model to predict the parameters of Hamiltonian's when given underlying time dependent observations as features. In particular, we have shown that using a trained CNN one can predict the Hamiltonian parameters such as excited state energy and electronic coupling up to 99.28% accuracy and mean-squared error as low as 0.65. We propose the use of the trained CNNs as an efficient way to study the excitation energy transfer dynamics of biological complexes. An improvement that can be investigated in future is a more sophisticated algorithm that will be able to distinguish between the systems in the overdamped regime that can be described by the set of observables that are purely dissipative. This work can be developed further in a few ways, either by investigating fully connected neural networks or by developing a model such that it may be independent of the dimension of input data. The latter modification would allow the user to input data such that in return the model may determine the dimension of the system under study which represents the number of pigments in the complex.

VI. ACKNOWLEDGMENTS
This work is based upon research supported by the South African Research Chair Initiative, Grant No. 64812 of the Department of Science and Innovation and the National Research Foundation of the Republic of South Africa. Support from the NICIS (National Integrated Cyber Infrastructure System) e-research grant QICSA is kindly acknowledged.