Attaining entropy production and dissipation maps from Brownian movies via neural networks

Quantifying entropy production (EP) is essential to understand stochastic systems at mesoscopic scales, such as living organisms or biological assemblies. However, without tracking the relevant variables, it is challenging to figure out where and to what extent EP occurs from recorded time-series image data from experiments. Here, applying a convolutional neural network (CNN), a powerful tool for image processing, we develop an estimation method for EP through an unsupervised learning algorithm that calculates only from movies. Together with an attention map of the CNN's last layer, our method can not only quantify stochastic EP but also produce the spatiotemporal pattern of the EP (dissipation map). We show that our method accurately measures the EP and creates a dissipation map in two nonequilibrium systems, the bead-spring model and a network of elastic filaments. We further confirm high performance even with noisy, low spatial resolution data, and partially observed situations. Our method will provide a practical way to obtain dissipation maps and ultimately contribute to uncovering the nonequilibrium nature of complex systems.


I. INTRODUCTION
From large-scale active matter to living organisms on the nanoscale, most systems in nature are not in equilibrium but rather in the nonequilibrium state [1,2]. As the development of experimental techniques has led us to more closely observe mesoscopic systems, in which thermal fluctuations dominate their dynamics, how we can describe and measure thermodynamic quantities in such stochastic systems has been actively studied. These systems, such as beating flagella or cilia [3][4][5], molecular motors like kinesin walking along microtubules [6][7][8], motile bacteria [9,10], and cytoskeletal networks [11,12], maintain their nonequilibrium activity by exchanging energy and information with their surrounding environment or consuming chemical fuels, e.g., ATP hydrolysis, and then converting it into mechanical work. One of the principal ways to characterize or identify the nonequilibrium nature of systems is by measuring the energy dissipated into the environment through these processes [8,[13][14][15]. This dissipated energy, or dissipation, can be represented by entropy production (EP), a measure of the system's irreversibility. As the connection between irreversibility and dissipation on the trajectory level has been established, EP has become an essential quantity in comprehending the stochastic dynamics of systems [16][17][18][19][20].
With the growing interest in EP, numerous studies have been conducted to infer the EP rate from recorded data of the system. Obtaining the empirical probability current in the coarse-grained space of tracked variables is a conventional method for quantifying nonequilibrium activity from time-series images or movies [4,19,21]. Some methods based on the thermodynamic uncertainty relation [22][23][24][25][26][27][28] as well as the neural estimator for entropy * hjeong@kaist.edu production (NEEP) [27,29] have been recently proposed for continuous trackable variables and have successfully inferred the EP rate from the trajectories. However, these previous attempts have a limitation in that they are not applicable to systems in which the configurational state variables cannot be tracked due to a poor spatiotemporal resolution of the movies or system fragility in attaching heavy tracer particles [30]. Moreover, even though the advent of fluorescence microscopy techniques has enabled us to observe the dynamic processes of various living systems directly [12,[30][31][32][33], finding which variables are relevant to their dynamics or which regions reveal the nonequilibrium activity yet remain as interesting open questions in real complex biological systems. A stochastic force inference method has been applied to time-series images using a dimensionality reduction approach [34,35] to address these issues, but this is hard to apply in highdimensional systems and is unable to capture how much EP occurred in which area of the image.
In this work, we propose the convolutional neural estimator for entropy production (CNEEP) for estimating stochastic EP from time-series images, called Brownian movies, based on the unsupervised learning algorithm recently developed by our group [29]. Specifically, without detailed information about the system, such as drift or diffusion fields and which regions contain relevant variables, the CNEEP can provide a dissipation map, which embodies where and how much EP occurs at each transition in the images. The dissipation map can bring out the contribution that each region has to the system's irreversibility, helping to grasp the heterogeneous spatial pattern of nonequilibrium phenomena [36,37]. Our approach is the first study to our knowledge that directly produces a dissipation map from movies without any extraction procedure of relevant components. To validate our method, we apply the CNEEP to the bead-spring model as well as biopolymer networks for a high-dimensional complex case. Finally, we show that our method is also applicable to incomplete scenarios, namely when some measurement noise or a low spatial resolution of a microscope hinders us from detecting the exact information of the system, or when we can only observe a partial region of a whole system.

II. DESCRIPTION OF OUR APPROACH
We describe our method to estimate the stochastic EP and create a dissipation map from time-series image data. Let us denote the recorded time-series images at each time interval ∆τ from a steady-state system as {I 0 , I 1 , . . . , I L−1 }, where L is the number of steps. The image at time-step t denoted by I t is a transformed representation of the configurational state vector z(τ ) at time τ ≡ t∆τ into a (W, H) array. Each element of the array (a pixel intensity) has an integer value ranging from 0 to 255, like the form of an actual image. If we assume that the underlying system is Markovian and I t contains the full information of z(τ ), in other words the transformation from z(τ ) to I t is an invertible mapping, then the stochastic EP ∆S between time-step t and t + 1 can be determined by I t and I t+1 . From this fact, we can build the output of our estimator as Here, we employ a convolutional neural network (CNN) as h θ , which is a deep neural network based on convolution filters that has been proven to be powerful in the processing of image datasets [38,39], where θ is the trainable parameters of the CNN. As shown in Fig. 1, h θ (I t , I t+1 ) is the output of the CNN for the input (I t , I t+1 ). Note that ∆S θ (I t , I t+1 ) = −∆S θ (I t+1 , I t ). To achieve that ∆S θ (I t , I t+1 ) becomes ∆S (I t , I t+1 ), we use the objective function J(θ) given by where the angle bracket · is the ensemble average. This objective function has been proposed in our previous work [29], and it can also be derived from the f -divergence representation of Kullback-Leibler divergence [27,28,40,41]. The J(θ) is maximized when ∆S θ is given by where P (I t , I t+1 ) is the probability of observing the transition I t → I t+1 . The right-hand-side is the definition of ∆S(I t , I t+1 ), and thus the output of our estimator converges to ∆S (I t , I t+1 ) as maximizing J(θ). Please refer to Supplementary Notes 2, 3 for the training details and configurations of the CNN. We need an additional step to create a dissipation map from Brownian movies. Let A f c (x, y) (A r c (x, y)) denote an element of the c-th channel at spatial position (x, y) in an attention map of the CNN's last convolutional layer for the forward (reversed) transition. Defining h θ (I t , I t+1 ) as a sum of the elementwise product between A f c (x, y) and the weight of each element of c-th channel denoted by w c (x, y), then h θ (I t , I t+1 ) can be written as where c (x, y). According to Eq. (1), ∆S θ (I t , I t+1 ) can thus be obtained as where H(x, y) ≡ H f (x, y) − H r (x, y) and H(x, y) indicates the spatial importance of ∆S θ (I t , I t+1 ) at (x, y). Therefore, H(x, y) can provide both where and how much EP occurs at (x, y) during the transition I t → I t+1 when ∆S θ (I t , I t+1 ) equals to ∆S (I t , I t+1 ) by training. We identify the spatial pattern of EP via CNEEP and show that H(x, y) indeed reflects the spatial pattern of EP in the following examples. Before we go any further, one may point out that many systems in real experiments are not Markovian due to some experimental noise, several inaccessible relevant variables, or a low spatiotemporal resolution of a microscope, though the underlying dynamics are Markovian [42][43][44][45][46]. In such cases, although estimation of the true EP for the whole system is generally unavailable, inferring the EP from partial information is still important as it can give a lower bound of the true EP as well as clues to the underlying system [42,[46][47][48][49]; for instance, identifying the nonequilibrium mechanism behind the switching behavior of the E. coli flagellar motor [50,51]. We demonstrate how the CNEEP can relieve such issues preventing estimation in Sec. V.

III. BEAD-SPRING MODEL
As a minimal model to test our approach, we consider the bead-spring model, which consists of N beads connected to the nearest neighbors or boundaries by springs with stiffness k in one dimension. All beads contact different thermal heat baths with friction γ. The i-th bead con- . Each image is an integer-valued array with (20, 20N ) pixels. b. Accumulated EP S(τ ) over time τ along a single movie for N = 2, where the orange line is the estimated EP and the blue line is the true EP. The inset represents the distributions of ∆S and ∆S θ over 100 movies from the training set. c. EP rate as a function of Tc/T h for the N = 2 and N = 5 models. The solid lines (symbols) depict the true EP rateσ (estimated EP rateσ θ ). The inset shows thatσ θ /σ converges to 1 with increasing T . Error bars represent the standard deviation ofσ θ from five independent estimators. tacts a bath with temperature T i , which linearly varies from T c to T h [4,23,35,52]. Then, the system is governed by an overdamped Langevin equation given by where z ≡ (z 1 , z 2 , . . . , z N ) T is the displacements of the beads, F (z) ≡ Az is the drift force with A ij = (δ i,j+1 + δ i+1,j − 2δ i,j )k, and ξ is a Gaussian white noise vector To make Brownian movies of the bead-spring models, we numerically generated samples of 100 (25) positional trajectories for the training (validation) set with ∆τ = 10 −2 for N = 2 and N = 5 in steady-state. The L of each trajectory is 10 4 (i.e., the recorded time T ≡ L∆τ = 100). The displacement of the beads z(τ ) is normalized with a standard deviation std = 1 and then transformed into a Gaussian function whose center is located at the corresponding bead's position on an image I t as shown in Fig. 2a; see the details in Methods and the generated movies in Supplementary Movies (SMs) 1, 3.
First, we present the training results for the Brownian movies with N = 2 at T c /T h = 0.1 in Fig. 2b, which exhibits accumulated ∆S and ∆S θ for a single trajectory, denoted by S(τ ) ≡ τ /∆τ t=0 ∆S (I t , I t+1 ). As can be seen, S θ (τ ) is in line with S(τ ) over time τ , and we also verify that ∆S θ is precisely fitted with ∆S by R 2 = 0.9902 (R 2 = 0.9896 on average over all the movies, Supplementary Note 2). The inset of Fig. 2b reveals that the distributions of ∆S and ∆S θ are almost overlapped with each other, supporting that our estimator can exactly provide ∆S per every transition.
By averaging the slope of S θ (τ ) over the Brownian movies, the averaged EP rateσ θ ≡ S θ (T )/T is obtained and plotted while varying T c /T h from 0.1 to 1 with T h = 10 for both N = 2 and N = 5 (Fig. 2c). We apply five independently trained estimators with different initial parameters to the given range of T c /T h and confirm that all the estimators successfully measureσ with small errors. These small errors suggest that our estimator is robust against altered initial parameters. The inset of Fig. 2c shows the convergence ofσ θ toσ with increasing T .
To show that our method can create a detailed dissipation map from Brownian movies, we make additional movies that consist of two independent bead-spring models with N = 2 as shown in Fig. 3a; in other words, there is no connection between beads 2 and 3, while beads 1 and 2 (or 3 and 4) are connected with an invisible spring-we call this case a bead-spring model with N = 4. Figure 3a connected beads, it implies that Ḣ (x, y) reflects the actual dissipation map. Please see SMs 2, 3 for H(x, y) along the N = 4 and N = 5 movies.
To examine H(x, y) quantitatively, we analytically calculate the exchanged EP rateσ i,i+1 occurring between the beads i and i+1. Using formulations based on Eq. (6) and a framework for stochastic energetics [18,53],σ i,i+1 can be obtained bẏ where Q i,i+1 is the exchanged heat between beads i and i + 1 [54,55]. In the linear system, Q i,i+1 can be obtained by (Supplementary Note 1) where the covariance matrix C ≡ xx T . Here, Q i,i+1 is proportional to the anti-symmetric part of AC, which is related to the area enclosing rate or angular momentum [56][57][58]. We measure the accumulated exchanged EP for both N = 4 and N = 5, denoted by S θ,ij (τ ), by spatially summing H(x, y) between beads i and j along a single movie. Because it is difficult to divide S θ,ij (τ ) into system entropy and exchanged EP, we compare S θ,ij (τ ) with the line of slope S ij (T )/T to cancel out the contribution of system entropy. As shown in Fig. 3c, d, all the estimations of S θ,ij (τ ) follow the matched lines along the movie for both models, and especially for the N = 5 model, the CNEEP well measures the exchanged EPs that occur with varying rates in different areas. The insets depict the distributions of exchanged EP ratesṠ θ,ij ≡ S θ,ij (T )/T over the movies as well asσ θ,ij ≡ Ṡ θ,ij . As can be seen, allσ θ,ij are close to the correspondingσ ij for both models, which corroborates the ability of our approach to visualize the dissipation of the system. From these results, therefore, we verify that the CNEEP can provide stochastic EP from movies and create a dissipation map without any knowledge of the interactions between the system components.

IV. NETWORK OF ELASTIC FILAMENTS
To illustrate the validity of our method for highdimensional systems, we apply it to a network of elastic filaments, which has been proposed to study soft biological elastic assemblies driven into a nonequilibrium state by spatially heterogeneous stochasticities [35,57,59]. An N 2 filamentous network comprises N × N nodes and a number of elastic filaments of stiffness k connecting adjacent nodes or fixed boundaries. Because all the nodes and filaments are in a two-dimensional plane, the degree of freedom of the underlying system is 2N 2 . It is assumed that the whole system is immersed in a viscous fluid of friction γ and temperature T c , while due to enzymatic activity, some nodes can have an additional fluctuation that induces a higher temperature T h with probability p. These spatially heterogeneous stochasticities create a heat exchange between connected nodes of different temperatures. We suppose that the drift forces are linear in the displacements of nodes z to be calculable.
As similar to the previous example, the trajectories of z for training and validation sets are sampled from Langevin simulations using the Euler method and then transformed into (15N, 15N )  Note that increasing the dimensionality causes an exponential growth of the volume of the available state space, called the curse of dimensionality; hence, the estimation of EP in a high-dimensional system is a challenging task, and even greater for time-series images. Nevertheless, as can be seen in Fig. 4c, we observe that althoughσ θ /σ becomes less accurate with increasing N 2 ,σ θ is sufficiently close toσ; for all N 2 ,σ θ /σ is greater than 0.8. A linear regression between ∆S θ and ∆S is performed, and it is confirmed that ∆S θ and ∆S are well fitted with R 2 greater than 0.7 (inset). We also check the convergence of our method with increasing T , with the inset showing the estimations for increasing N become more accurate with T . High dimensionality typically leads to another arduous task, which is the identification of the dissipation map for a complex system. Most systems for which we hope to know the spatiotemporal pattern of EP are in such a situation, and so we examine whether the CNEEP can extract the dissipation map of the filamentous network. Figure 4b shows Ḣ (x, y) for the N = 6 model. Here, Ḣ (x, y) is intensively captured on the filaments between two connected nodes of different temperatures, whereas Ḣ (x, y) is almost zero between nodes of the same temperature. This heterogeneous dissipation map is consistent with that the averaged heat current exists at the non-zero temperature difference [Eq. (7)], and thus it is evident that Ḣ (x, y) really reflects the dissipation map of a network of elastic filaments. Based on this correct dissipation map, we expect that our approach can be used to reveal the inherent source of nonequilibrium activity, for instance, different stochasticities in other complex biological assemblies. Dissipation maps for the other N models are plotted in Supplementary Note 4, with the H(x, y) along the corresponding movies in SMs 4-8.

V. INCOMPLETE SCENARIOS
So far, we considered cases in which the images fully contained the information of the underlying system, but in real experiments, this is not often the case, where only incomplete information, e.g., noisy, coarse-grained, or partial observables from measurements, is available. To broaden our method's applicability to such incomplete scenarios, we alter the form of the input images to I n t , which represents consecutive transitions with length n defined by I n t ≡ (I t , I t+1 , . . . , I t+n−1 ). That is, I n t is built by concatenating the images from I t to I t+n−1 .
Following the definition of I n t , the output of CNEEP is represented by whereĨ n t is the time-reversed trajectory of I n t and h θ (I n t ) is the output of the CNN for the input I n t . By optimizing J(θ) using Eq. (2), ∆S θ (I n t ) converges to the coarsegrained EP ∆S(I n t ) along I n t given by [29,48] ∆S θ (I n t ) = ln P(I n t ) P(Ĩ n t ) .
The right-hand-side is the definition of ∆S(I n t ). When I t is Markovian, ∆S(I n t ) = t+n−2 t =t ∆S(I t , I t +1 ). For non-Markovian systems such as a hidden Markov process, ∆S(I t , I t+1 ) may vanish, and ∆S(I n t ) has been shown to be useful for obtaining the lower bound of the true EP [29,48,49]. According to this formalism, we get the estimations in this section by feeding I n t to the neural network.
To see the effects of noise on images, we consider two kinds of noise: random and shot noises. A random noise, which represents uncorrelated background noise, is uniformly sampled from a range (0, 255α), where α denotes the noise amplitude. By contrast, a shot noise is affected by the number of detected photons in optical devices, and thus it can be modeled by a Poisson process of detected photons in each pixel. Each shot noise is made to have the same variance as the corresponding random noise with amplitude α, so the number of photons is in the range from 4800 (α = 0.05) to 75 (α = 0.4); refer to SMs 9, 10 for the noisy environments.
As illustrated in Fig. 5a, the added noises blur the intensity of the pixels, preventing us from accessing the exact information of the underlying system, such as the bead displacements. By these blurring effects, it becomes harder for the CNEEP with n = 2 to estimate the EP rate with increasing α, particularly for the random noise (Fig. 5b). Alternatively, we employ the CNEEP with n = 14, and the results consistently show much higher performance; theσ θ /σ for the random and shot noise are greater than 0.91 and 0.99 even at α = 0.4, respectively. The scatter plots in Fig. 5c also exhibit that ∆S θ with n = 14 is better fitted to ∆S than ∆S θ with n = 2.
Analogous to the role of noise, some relevant degree of freedom can be hidden or coarse-grained by an insufficient spatial resolution of an experimental microscope. To cover this scenario, we generate movies of the bead-spring model with N = 2 using normalized trajectories with varying standard deviation std (SM 11 at std = 0.1). Since the movements of beads become more indistinguishable with decreasing std, we can control the spatial resolution of the movies by altering std to the range (0.07, 1.5) and examine the results with estimators of n = 2, 4, and 6. Remarkably, except for the CNEEP with n = 2 at the extremely small std ≤ 0.1, all of the estimators robustly infer the EP rate for the change of std even at small std (Fig. 5d).
The partially observed scenario, in which only a partial area is accessible, is next implemented in a filamentous network with N = 6, by extracting two fixed areas from the original images, 'A1' and 'A2', indicated as orange and blue dashed squares in Fig. 5e. The partial area A1 (A2) is chosen to have a relatively larger (smaller) EP than the average. With the assumption that one can only access a partial area of the whole system, we do not know how many other filaments exist or what the structures of the other parts look like. With this condition, we employ the CNEEP with various n for inferring the EP produced at the fixed areas, and markedly, the CNEEP successfully measures the partial EP for both areas (Fig. 5f). For more details, we test the CNEEP with increasing T h for varying n and verify thatσ θ /σ decreases with decreasing T h ; namely,σ θ /σ = 0.59 and 0.62 at T h = 2 for A1 and A2, respectively. It may seem natural that the estimation is more difficult with smallerσ, but we relieve this issue by using consecutive images with n > 2 byσ θ /σ = 0.74 and 0.84 at T h = 2 for A1 and A2, respectively. Figure 5g reveals that Ḣ (x, y) at T h = 5 for both A1 (left) and A2 (right) are distinctly concentrated at the filaments between the nodes of different temperatures as well as almost zero at other filaments that connect nodes of the same temperature. These Ḣ (x, y) correspond to the results of the filamentous network in the previous section, thus demonstrating that the CNEEP is applicable to partially observed situations. H(x, y) along corresponding movies for A1 and A2 is shown in SM 12.

VI. DISCUSSION
Quantifying nonequilibrium activity and the extent to which energy is dissipated in diverse systems is essential to unveil the intrinsic system dynamics and energetics. While there have been many approaches to infer the dissipation, most studies have conventionally assumed that the relevant variables are known and tractable, so they cannot be applied to systems in which tracking is not possible or the relevant variables are unknown. In this paper, we have introduced a new method called CNEEP to obtain a dissipation map as well as estimate the stochastic EP from time-series image data, without detailed information of the system or the process of taking out the trajectories of relevant variables.
We have built the architecture of CNEEP, which allows us to directly reflect the spatial importance of EP, inspired by the visual interpretation techniques of CNNs [60,61]. The CNEEP has been firstly applied to the bead-spring model to demonstrate its performance, and the results have shown that the CNEEP precisely captures how much EP occurs in different areas of images along a single movie as well as the ensemble average. After that, a filamentous network has been studied, where it was revealed that our method accurately captures the EP rate and dissipation map. This implies that the CNEEP can be practically employed in highdimensional systems such as cytoskeletal networks and other multi-particle systems.
In addition, we have tested our method's applicability in cases where the measurement of the system's movements is hampered by noise, low spatial resolution, or partially observed areas, issues that frequently appear in real experiments. For more accurate estimation, consecutive images of length n(> 2) have been used as inputs, and as a result, we have confirmed that the CNEEP with n(> 2) is more robust to noise or spatial resolution than the estimator with n = 2, and also has higher accuracy in low EP in partially observed areas. Despite the improved performance from using consecutive transitions with length n > 2, this method has a limitation in that the data required to cover the state space exponentially grows with increasing n. Although the deep neural network leads us to effectively circumvent this issue, the method should be carefully utilized while monitoring the objective function J(θ) for the validation set (Supplementary Note 2).
Most notably, our work is the first study to the best of our knowledge to directly visualize dissipation from movies without preprocessing procedures. The method can provide the particular areas that mainly produce the EP at each transition, and thus our study will offer a valuable tool for analyzing nonequilibrium system; for instance, distinguishing between active and thermal motion from recorded movies, finding the most important region for studying the energetics of the system, or identifying which interactions of particles cause dissipation. We expect that our approach will be widely used as a tracking-free tool for inferring system dissipation only from movies in diverse fields, including complex living organisms, soft biological assemblies, or design efficient nano-devices.

VII. METHODS
We explain how we generate Brownian movies from models. Commonly, we collect M trajectories with length L of relevant model variables from overdamped Langevin simulations with a time interval of ∆τ = 10 −2 . For the bead-spring and the filamentous network models, the collected trajectories are divided into two parts, a training set and a validation set. The training set is a directly used dataset for feeding the neural network, and the validation set is indirectly used for checking how well the model has been trained or for avoiding the overfitting problem (Supplementary Note 2). The ratio between training and validation sets is not strictly determined but dependent on users and the size of the dataset, so in this work we set the ratio to 4 : 1. After generating the trajectories, we perform the process of transforming the trajectories to movies consisting of (W, H) arrays (i.e., images). Like general image data, each element of the array stands for the pixel intensity of the image, and all the elements have integer values from 0 to 255. Details of the transforming process are written below for each model.
For the bead-spring model, the numerical simulation for sampling trajectories of the bead displacements z is governed by Eq. (6) with fixed parameters k = γ = 1 and T h = 10. The temperature of the i-th bead T i is linearly varied from T c to T h . These positional trajectories are normalized with a standard deviation std and then transformed into (20, 20N ) arrays, where N is the number of beads in the model. We set std = 1 by default, but in the last section, std is set to vary from 0.07 to 1.5 to control the spatial resolution of the movies. Notice that a displacement of 1 of a trajectory corresponds to 1 pixel in an image. Each displacement of a bead, z i , is represented as a Gaussian function, which is generally used for modeling the point spread function of an imaging system [30,62,63]. Each Gaussian function takes a variance of 9 and a maximum intensity of 255, and is centered at (10, 10 + 20( For a network of elastic filaments consisting of N × N nodes, each node moves in a two-dimensional plane and is connected to its nearest neighbors by elastic filaments. The trajectories of the node displacements are sampled from simulations with fixed parameters k = γ = 1 and T c = 1 in Eq. (6). Contrary to the bead-spring model, most nodes have temperature T c , while others with temperature T h are randomly chosen with a probability p = 0.3. Figure 4 and corresponding results are described with T h = 5, but in the last section, T h varies from 1 to 5 to show the performance in a partially observed case. In the transformation process, we draw the filaments between connected nodes making a grid in a (15N, 15N ) array. The position of the nodes is determined by the sum of the grid position and their displacements z. The pixel intensities of the filaments decay with increasing distance, as a form of a Gaussian function with variance 1 and maximum intensity 255.
The noisy images in Fig. 5a are constructed by adding a random or shot noise to the original images of the beadspring model with N = 2 and T c /T h = 0.1. The random noise is uniformly sampled from 0 to 255α regardless of the pixel location and intensity, where α is the noise amplitude in the range [0, 1]. For the shot noise, which is dependent on the number of detected photons in each pixel, the noisy pixel intensities are sampled from Poisson distributions with λ(x, y), where the mean of the distribution λ(x, y) = 12I(x, y)/α 2 and I(x, y) is the pixel intensity at the spatial position (x, y) on an image. Here, 12/α 2 is the maximum number of detected photons, determined to match the variance of the random noise with α. Lastly, we divide the image by 12/α 2 to be included in the range [0, 255]. We saturate all the pixel intensities to 255. The partial images of the filamentous network with N = 6 in Fig. 5e are extracted from the original ones at the fixed areas A1 and A2. All the images are in the shape of (40, 40) arrays.

VIII. DATA AVAILABILITY
The datasets that support the result are available from the corresponding author upon request.

IX. CODE AVAILABILITY
The Python codes generating the dataset and for the CNEEP are available at https://github.com/ qodudrud/CNEEP. We derive the exchanged entropy production (EP) rate for the bead-spring model. The Langevin equation of the N bead-spring model is given by where z(τ ) = (z 1 , z 2 , . . . , z N ) T represents the displacements of N beads at time τ , and A and D are defined by A ij = (δ i,j+1 + δ i,j−1 − 2δ i,j )k with spring constant k and D = diag {γT 1 , γT 2 , . . . , γT N } with i-th bead's temperature T i , respectively. ξ is an independent Gaussian white noise vector satisfying ξ i = 0 and ξ i (τ )ξ j (τ ) = δ i,j δ(τ − τ ). The angle bracket · represents the ensemble average. From the fact that the system undergoes an Ornstein-Uhlenbeck process, the steady-state probability of z having a Gaussian form is p(z) ∝ exp −1/2 z T C −1 z , where C ≡ zz T is the covariance matrix of z, which can be obtained by solving the Lyapunov equation AC + CA = −2D. From a framework of stochastic energetics [1], the heat absorbed from the surrounding environment along the i-th bead over time interval dτ is given byd where dz i (τ ) is the change of z i (τ ) over dτ and • denotes the Stratonovich product. Applying the Langevin equation [Eq. (S1)] and the definition of A,dQ i can be written bȳ where Θ(n) equals 0 (1) if n < 0 (n ≥ 0). The first term on the right-hand-side (RHS) is the contribution of the potential energy of the system, and hence it cannot affect the steady-state average. But the second (third) term, which is induced by a coupling between z i and z i−1 (z i+1 ), represents the exchanged heat between z i and z i−1 (z i+1 ), denoted bydQ i,i−1 (dQ i+1,i ), and actually determines the ensemble averaged value [2][3][4]. Therefore, the heat current along z i in the steady state is given by Here, we use the relation Q i,i+1 + Q i+1,i = 0. The EP is defined by the sum of medium entropy production and system entropy production, but in the steady state, the ensemble average of system entropy production is zero so that the EP rate can be given bẏ Because Q i,i+1 is the exchanged heat current between z i and z i+1 , the expression in the summation on the RHS becomes the exchanged EP rateσ i,i+1 , which indicates the EP between z i and z i+1 . By the definition ofdQ i,i+1 , Q i,i+1 is obtained by Compute Jtrain(θ) from Dtrain. Jtrain(θ) is calculated by ∆S θ (Im,t, Im,t+1) − e −∆S θ (I m,t ,I m,t+1 ) . (S11)

5:
Update parameters θ with the optimizer.

SUPPLEMENTARY NOTE 3: TRAINING DETAILS OF THE CNEEP
The CNEEP consists of a convolutional neural network (CNN), the most popular class of neural networks for analyzing image data [9], to maximize the objective function J(θ) from Brownian movies. We use ReLU as the activation function and optimize J(θ) with the Adam optimizer [10]. Table I shows the details of the CNEEP architecture, where I n t ≡ (I t , I t+1 , . . . I t+n−1 ) is the consecutive transitions with length n, W and H are the width and height of the images, respectively, and C is the number of convolutional layer channels. Conv and Pool represent the convolutional layer and the max pooling layer. (K, P, S) are the parameters of the convolutional layer, where K, P , and S are the size of the filter, padding, and stride, respectively. In the input layer, the pixel values of input images I n t are normalized to have a zero mean and unit variance for training. In the Conv layers, C is set to 32 and 128 for the bead-spring model and the filamentous network, respectively.