PHYSICAL REVIEW RESEARCH 2, 023219 (2020) General anesthesia reduces complexity and temporal asymmetry of the informational structures derived from neural recordings in Drosophila

Roberto N. Muñoz ,1,* Angus Leung ,2,† Aidan Zecevik,1,‡ Felix A. Pollock,1,§ Dror Cohen,3,4,‖ Bruno van Swinderen,5,¶ Naotsugu Tsuchiya,4,6,3,** and Kavan Modi 1,†† 1School of Physics & Astronomy, Monash University, Clayton, Victoria 3800, Australia 2School of Psychological Sciences, Monash University, Clayton, Victoria 3800, Australia 3Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology (NICT), Suita, Osaka 565-0871, Japan 4School of Psychological Sciences and Turner Institute for Brain and Mental Health, Monash University, Melbourne, Victoria 3800, Australia 5Queensland Brain Institute, The University of Queensland, St Lucia, Queensland 4072, Australia 6Advanced Telecommunications Research Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan

(Received 31 July 2019; revised manuscript received 13 January 2020; accepted 3 April 2020; published 22 May 2020) We apply techniques from the field of computational mechanics to evaluate the statistical complexity of neural recording data from fruit flies. First, we connect statistical complexity to the flies' level of conscious arousal, which is manipulated by general anesthesia (isoflurane). We show that the complexity of even single channel time series data decreases under anesthesia. The observed difference in complexity between the two states of conscious arousal increases as higher orders of temporal correlations are taken into account. We then go on to show that, in addition to reducing complexity, anesthesia also modulates the informational structure between the forward-and reverse-time neural signals. Specifically, using three distinct notions of temporal asymmetry we show that anesthesia reduces temporal asymmetry on information-theoretic and information-geometric grounds. In contrast to prior work, our results show that: (1) Complexity differences can emerge at very short timescales and across broad regions of the fly brain, thus heralding the macroscopic state of anesthesia in a previously unforeseen manner, and (2) that general anesthesia also modulates the temporal asymmetry of neural signals. Together, our results demonstrate that anesthetized brains become both less structured and more reversible.

I. INTRODUCTION
Complex phenomena are everywhere in the physical world. Typically, these emerge from simple interactions among elements in a network, such as atoms making up molecules or organisms in a society. Despite their diversity, it is possible to approach these subjects with a common set of tools, using numerical and statistical techniques to relate microscopic details to emergent macroscopic properties [1]. There has long been a trend of applying these tools to the brain, the archetypal complex system, and much of neuroscience is concerned with relating electrical activity in networks of neurons to psychological and cognitive phenomena [2]. In particular, there is a growing body of experimental evidence [3] that neural firing patterns can be strongly related to the level of conscious arousal in animals.
In humans, level of consciousness varies from very low in coma and under deep general anesthesia to very high in fully wakeful states of conscious arousal [4]. With the current technology, precise discrimination between unconscious vegetative states and minimally conscious states are particularly challenging and remains a clinical challenge [5]. Therefore, substantial improvement in accuracy of determining such conscious states using neural recording data will have significant societal impacts. Toward such a goal, neural data has been analyzed using various techniques and notions of complexity to try to find the most reliable measure of consciousness [6,7].
One of the most successful techniques to date in distinguishing levels of conscious arousal is the perturbational complexity index [8][9][10], which measures the neural activity patterns that follows a perturbation of the brain through magnetic stimulation. The evoked patterns are processed through a pipeline then finally summarized using Lempel-Ziv complexity [9]. This method is inspired by a theory of consciousness, called integrated information theory (IIT) [11,12], which proposes that a high level of conscious arousal should be correlated with the amount of so-called integrated information, or the degree of differentiated integration in a neural system (see Ref. [13] for details). While there are various ways to capture this essential concept [14,15], one way to interpret integrated information is as the amount of loss of information a system has on its own future or past states based on its current state, when the system is minimally disconnected [16][17][18].
These complexity measures, inspired by IIT, are motivated by the fundamental properties of conscious phenomenology, such as informativeness and integratedness of any experience [11]. While there are ongoing efforts to accurately translate these phenomenological properties into mathematical postulates [13], such translation often contains assumptions about the underlying process which are not necessarily borne out in reality. For example, the derived mathematical postulates in IIT assume Markovian dynamics, i.e., that the future evolution of a neural system is determined statistically by its present state [15]. Moreover, IIT requires computing the correlations across all possible partitions between subsystems, which is computationally heavy [16] in relation to methods which do not require such partitioning to work. Assuming that the hierarchical causal influences in the brain would manifest as oscillations across a range of frequencies and spatial regions [19], non-Markovian temporal correlations likely play a significant role in explaining any experimentally measurable behaviours, including the level of conscious arousal. There is therefore, scope for applying more general notions of complexity to meaningfully distinguish macroscopic brain states that support consciousness.
A conceptually simple approach to quantifying the complexity of time series data, such as the fluctuating potential in a neuron, is to construct the minimal model which statistically reproduces it. Remarkably, this minimal model, known as an epsilon machine ( machine), can be found via a systematic procedure which has been developed within the field of computational mechanics [20][21][22]. Crucially, machines account for multiple temporal correlations contained in the data and can be used to quantify the statistical complexity of a process-the minimal amount of information required to specify its state. As such they have been applied over various fields, ranging from neuroscience [23,24] and psychology [25] to crystallography [26] and ecology [27], to the stock market [28]. Last, unlike IIT the machine analysis can be performed for data coming from a single channel.
In this paper, we use the statistical complexity derived from an machine analysis of neural activity to distinguish states of conscious arousal in fruit flies (Drosophila melanogaster). We analyze neural data collected from flies under different concentrations of isoflurane [29,30]. By analyzing signals from individual electrodes and disregarding spatial correlations, we find that statistical complexity distinguishes between the two states of conscious arousal through temporal correlations alone. In particular, as the degree of temporal correlations increases, the difference in complexity between the wakeful and anesthetized states becomes larger. In addition to measuring complexity, the machine framework also allows us to assess the temporal irreversibility of a process-the difference in the statistical structure of the process when read forward versus backward in time. This may be particularly important for wakeful brains which are thought to be sensitive to the statistical structure of the environment which runs forward in time [30][31][32]. Using the nuanced characterisation of temporal information flow offered by the machine framework [33], we then analyze the time irreversibility and crypticity of the neural signals to further distinguish the conscious states. We find that the asymmetry in information structure between forward and reverse-time neural signals is reduced under anesthesia.
The present approach singularly differentiates between highly random and highly complex information structure; accounts for temporal correlations beyond the Markov assumption; and quantifies temporal asymmetry of the process. None of the standard methods possesses all of these features within a single unified framework. Before presenting these results in detail in Sec. III and discussing their implications in Sec. IV, we begin with a brief overview of the machine framework we will use for our analysis.

II. THEORY: MACHINES AND STATISTICAL COMPLEXITY
To uncover the underlying statistical structure of neural activity that characterises a given conscious state, we treat the measured neural data, given by voltage fluctuations in time, as discrete time series. To analyze these time series, we use the mathematical tools of computational mechanics, which we outline in this section. We start with a general discussion on the ways to use time series data to infer a model of a system while placing machines in this context. Next, we explain how we construct machines in practice. Finally, we show how this can be used to extract a meaningful notion of statistical complexity of a process.

A. From time series to machines
In abstract terms, a discrete-time series is a sequence of symbols r = (r 0 , . . . , r k , . . .) that appear over time, one after the other [34]. Each element of r corresponds to a symbol from a finite alphabet A observed at the discrete time step labeled by the subscript k. The occurrence of a symbol, at a given time step, is random in general and thus the process, which produces the time series, is stochastic [35]. However, the symbols may not appear in a completely independent manner, i.e., the probability of seeing a particular symbol may strongly depend on symbols observed in the past. These temporal correlations are often referred to as memory, and they play an important role in constructing models that are able to predict the future behavior of a given stochastic process [36].
Relative to an arbitrary time k, let us denote the future and the past partitions of the complete sequence as r = ( r, r), where the past and the future are r = (. . . , r k−2 , r k−1 ) and r = (r k , r k+1 , . . .), respectively. In general, for the prediction of the immediate future symbol r k , knowledge of the past symbols r := (r k− , . . . , r k−2 , r k−1 ) may be necessary. The number of past symbols we need to account for to optimally predict the future sequence is called the Markov order [37].
In general, the difficulty of modeling a time series increases exponentially with its Markov order. However, not all distinct pasts lead to unique future probability distributions, leaving room for compression in the model. In a seminal work, Crutchfield and Young showed the existence of a class of models, which they called machines, that are provably the optimal predictive models for a non-Markovian process under the assumption of statistical stationarity [20,21]. Constructing the machine is achieved by partitioning sets of partial past observations r into causal states. That is, two distinct sequences of partial past observations r and r belong to the same causal state S i ∈ S, if the probability of observing a specific r given r or r is the same; that is where ∼ indicates that two histories correspond to the same causal state. The conditional probability distributions in Eq. (1) may always be estimated from a finite set of statistically stationary data via the naive maximum likelihood estimate, given by P(r k | r ) = ν(r k , r )/ν( r ), where ν(X ) is the frequency of occurrence of subsequence X in the data. For the case of nonstationary data, the probabilities obtained by this method will produce a nonminimal model that corresponds to a time-averaged representation of the time series. We now discuss how to practically construct an machine for a given time series.

B. Constructing machines with the CSSR algorithm
Several algorithms have been developed to construct machines from time series data [20,38,39]. Here, we briefly explain the causal state splitting reconstruction (CSSR) algorithm [25], which we use in this work to infer machines predicting the statistics of neural data we provide as input.
The CSSR algorithm proceeds to iteratively construct sets of causal states accounting for longer and longer subsequences of symbols. In each iteration, the algorithm first estimates the probabilities P(r k | r ) of observing a symbol conditional on each length prior sequence and compares them with the distribution P(r k |S = S i ) it would expect from the causal states it has so far reconstructed. If P(r k | r ) = P(r k |S = S i ) for some causal state, then r is identified with it. If the probability is found to be different for all existing S i , then a new causal state is created to accommodate the subsequence. By constructing new causal states only as necessary, the algorithm guarantees a minimal model that describes the non-Markovian behavior of the data (up to a given memory length), and hence the corresponding machine of the process.
The CSSR algorithm compares probability distributions via the Kolmogorov-Smirnov (KS) test [40,41]. The hypothesis that P(r k | r ) and P(r k |S = S i ) are identical up to statistical fluctuations is rejected by the KS test at the significance level σ when a distance D KS [42] is greater than tabulated critical values of σ [43]. In other words, σ sets a limit on the accuracy of the history grouping by parametrizing the probability that an observed history r belonging to a causal state S i , is mistakenly split off and placed in a new causal state S j . Our analysis, in agreement with Ref. [25], found that the choice of this value does not affect the outcome of CSSR within the tested range of 0.001 < σ < 0.01. As a result, we set σ = 0.005.
As it progresses, the CSSR algorithm compares future probabilities for longer subsequences, up to a maximum past history length of λ, which is the only important parameter that must be selected prior to running CSSR in addition to σ . If the considered time series is generated by a stochastic process of Markov order , choosing λ < results in poor prediction because the inferred machine cannot capture the long-memory structures present in the data. Despite this, the CSSR algorithm will still produce an machine that is consistent with the approximate future statistics of the process up to order-λ correlations [25]. Given sufficient data, choosing λ guarantees convergence on the true machine. One important caveat to note is that the time complexity of the algorithm scales asymptotically as O(|A| 2λ+1 ), putting an upper limit to the longest history length that is computationally feasible to use. Furthermore, the finite length of the time series data implies an upper limit on an "acceptable" value of λ. Estimating P(r k | r λ ) requires sampling strings of length λ from the finite data sequence. Since the number of such strings grows exponentially with λ, a value of λ that is too long relative to the size N of the data, will result in a severely under-sampled estimation of the distribution. A distribution P(r k | r λ ) that has been estimated from an under-sampled space is almost always never equal to P(r k |S = S i ), resulting in the algorithm creating a new causal state for every string of length λ it encounters. A bound for the largest permissible history length is L(N ) log 2 N/ log 2 |A|, where L(N ) denotes maximum length for a given data size of N [44,45]. Once these considerations have been taken into account, the machine produced by the algorithm provides us with a meaningful quantifier of the complexity of the process generating the time series, as we now discuss.

C. Measuring the complexity and asymmetry of a process
The output of the CSSR algorithm is the set of causal states and rules for transitioning from one state to another. That is, CSSR gives a Markov chain represented by a digraph [20,37] G(V, E ) consisting of a set of vertices v i ∈ V and directed edges {i, j} ∈ E , e.g., Figs. 1(c) and 1(d). Using these rules, one can find P(S i ), which represents the probability that the machine is in the causal state S i at a any time. The Shannon entropy of this distribution quantifies the minimal number of bits of information required to optimally predict the future process; this measure, first introduced in Ref. [20], is called the statistical complexity: Formally, the causal states of a time series depend upon the direction in which the data is read [33]. The main consequence of this result is that the set of causal states obtained by reading the time series in the forward direction S + , are not necessarily the same as those obtained by reading the time series in the reverse direction S − . Naturally, this corresponds to potential differences in forward and reverse-time processes and the associated complexities, which is known as causal irreversibility, capturing the time-asymmetry of the process. Another (stronger) measure of time-asymmetry is crypticity: This quantity measures the amount of information hidden in the forward and reverse machines that is not revealed in the future or past time series, respectively. Specifically, it combines the information that must be supplemented to determine the forward machine given the reverse machines and the information to determine reverse machines given the forward machine. In each case, this is equivalent to the difference between the complexity of a bidirectional machine, denoted C ± μ [33], and that of the corresponding unidirectional machine. Throughout this manuscript, we implicitly refer to the usual forward-time statistical complexity C + μ when writing C μ , unless otherwise stated.
Finally, an operational measure for time-asymmetry is defined by the microscopic irreversibility, which quantifies how statistically distinguishable the forward and reverse machines are, in terms of the sequences of symbols they produce. If the forward-time machine produces the same sequences with similar probabilities to the reversetime machine, then the process is reversible. Should a sequence available to M + be impossible for M − to produce, then the process is strictly irreversible. Here, we assess the distinguishability between two machines by estimating the asymptotic rate of (symmetric) Kullback-Leibler (KL) diver-gence D KLS between long output sequences; this measure is commonly applied to stochastic models [46]. Specifically, where D KL is the regular, nonsymmetric estimated KL divergence rate [47]. The KL divergence can be proved to be a unique measure that satisfies all of the theoretical requirements of information-geometry [17,48,49]. A few remarks are in order: in general, any one of the above measures vanishing does not imply that the other measures must also vanish. For instance, consider the case where the structures of the forward (M + ) and reverse-time (M − ) machines are different but they happen to have the same complexities, i.e., C + μ = C − μ . Then, clearly we have = 0 but d = 0 and D KLS = 0. However, consider the case when M + and M − are the same; here, we have = D KLS = 0, yet may not have d = 0. This means that vanishing D KLS implies that = 0 (but not the converse, and not d = 0). This turns out be an interesting extremal case because, while the forward and reverse processes are identical, the nonvanishing crypticity accounts for the information required to synchronise the corresponding machines, i.e., producing the joint statistics of the paired machines. Moreover, we can conclude that microscopic irreversibility is a stronger measure than causal irreversibility; this comes at the expense of computational cost, i.e., the former is harder to compute than the latter. In essence, each measure above represents a different notion of temporal asymmetry, with its own operational significance. Causal irrversibility and crypticity are information-theoretic constructs, while microscopic irrversibility is a informationgeometric construct.
In the next section, we describe the experimental and analytical methods, as well as the results: that the statistical complexity and temporal asymmetry of the neural time series, taken from fruit flies, significantly differ between states of conscious arousal.

A. Methods
We analyzed local field potential (LFP) data from the brains of awake and isoflurane-anesthetized D. melanogaster (Canton S wild type) flies. Here, we briefly provide the essential experimental outline that is necessary to understand this paper. The full details of the experiment are presented in Refs. [29,30]. LFPs were recorded by inserting a linear silicon probe (Neuronexus 3mm-25-177) with 16 electrodes separated by 25 μm. The probe covered approximately half of the fly brain and recorded neural activity as illustrated in Fig. 1(a). A tungsten wire inserted into the thorax acted as the reference. The LFPs at each electrode were recorded for 18 s, while the fly was awake and 18 s more after the fly was anesthetized (isoflurane, 0.6% by volume, through an evaporator). Flies' unresponsiveness during anesthesia was confirmed by the absence of behavioural responses to a series of air puffs, and recovery was also confirmed after isoflurane gas was turned off [29].
We used data sampled at 1 kHz for the analysis [29], and to obtain an estimate of local neural activity, the 16 electrodes were re-referenced by subtracting adjacent signals giving 15 channels which we parametrize as c ∈ [1,15]. Line noise was removed from the recordings, followed by linear de-trending and removing the mean. The resulting data is a fluctuating voltage signal, which is time-binned (1 ms bins) and binarized by splitting over the median, leading to a time series see Fig. 1 For each of the 13 flies in our data set, we considered 30 time series of length N = 18 000. These correspond to the 15 channels, labeled numerically from the central to peripheral region as depicted in Fig. 1(a), and the two states of conscious arousal. Using the CSSR algorithm [25], we constructed machines for each of these time series as a function of maximum memory length within the range λ ∈ [2,11], measured in milliseconds. This is below the memory length L(N ) ∼ 14 beyond which we would be unable to reliably determine transition probabilities for a sequence of length N (see Sec. II B) [51]. For a given time direction ξ ∈ {+ : forward, − : reverse}, we recorded the resulting 3 900 machine structures and their corresponding statistical complexities C (ξ,ψ ) μ , and grouped them according to their respective level of conscious arousal, ψ ∈ {w, a} for awake and anesthesia, channel location, c, and maximum memory length, λ. Thus, the statistical complexity we computed in a given time direction is a function of the set of parameters {ψ, c, λ} for each fly, f . We also determined the irreversibility , crypticity d, and symmetric KL divergence rate D KLS for each fly and again grouped them over the same set of parameters {ψ, c, λ}. While we found that not all the data is strictly stationary, in that the moving means of the LFP signals were not normally distributed, the conclusions we draw from them are still broadly valid. As mentioned in Sec. II A, machines reconstructed from approximately stationary data are timeaveraged models, and are likely to underestimate the true statistical complexity of the corresponding neural processes.
We are principally interested in the differences the infor- KLS } have over states of conscious arousal and thus consider for fixed values of { f , c, λ}. Positive values of Q indicate higher complexities observed in the wakeful state relative to the anesthetized one. Finally, we use the notation Q ψ x to denote taking an average of any information quantity Q ψ , over a specific parameter x ∈ { f , c, λ}. For example, C + μ f means taking the fly-averaged difference in forward-time statistical complexity.
To assess the significance of each of the parameters ψ, c, λ, and ξ , or some combination of them, have on the response of the elements in the set Q across flies, we conducted a statistical analysis using linear mixed effects modeling [52] (LME). The LME analysis describes the response of a given quantity Q by modeling it as a multidimensional linear regression of the form The resulting model in Eq. (7) consists of a family of equations where Q is the vector allowing for different responses of a quantity Q for each specific fly, channel location, level of conscious arousal, and time direction where applicable. Memory length λ, channel location c, state of conscious arousal ψ, and time direction ξ (again, where applicable) are the parameters that Q responds to. To account for variations in the response caused by interactions between parameters (e.g., between memory length and channel location), we included them in the model. Letting X = {λ, c, ψ, ξ} be the set of the parameters which may induce responses, we can write all the nonempty k-combinations between them as F = {λ, c, ψ, ξ, λc, λψ, ..., λcψξ } = X k \∅. The elements in F are known as the fixed effects of Eq. (7), and are contained as elements within the matrix F. The vector β, contains the regression coefficients describing the strength of each of the fixed effects F.
In addition to fixed effects affecting the response of an element of Q in our experiment, we also took into account any variation in response caused by known random effects. In particular, we expected stronger response variations to be caused by correlations occurring between the channels within a single fly, compared to between channels across flies. These random effects are contained as elements of the matrix R, and the vector b encodes the regression coefficients describing their strengths. Finally, the vector E describes the normally distributed unknown random effects in the model. The regression coefficients contained in the vectors β and b, were obtained via maximum likelihood estimation such that E are minimized. The explicit form of Eq. (7) used in this analysis is detailed in the Appendix.
With the full linear mixed effects model given by Eq. (7), we tested the statistical significance of a fixed effect in F. This was accomplished by comparing the log-likelihood of the full model with all fixed effects, to the log-likelihood of a reduced model with the effect of interest removed [53] (regression coefficients associated with the effect are removed). This comparison between the likelihood models is given by = 2(h full − h reduced ), where is the likelihood ratio, h full is the log-likelihood of full model, and h reduced is the log-likelihood of the model with the effect of interest removed.
Under the null hypothesis, when a fixed effect does not have any influence on an informational quantity Q, i.e., the regression coefficients for the effect are vanishing, the likelihood ratio is χ 2 distributed with degrees of freedom equal to the difference in the number of coefficients between the models. Therefore, we considered any fixed effect in the set F to have a statistically significant effect on a quantity if the probability of obtaining the likelihood ratio given the relevant χ 2 distribution was less than 5% (p < 0.05). Thus, for each significant effect we report the fixed effect being tested, i.e., an element of F, the obtained likelihood ratio χ 2 (n − 1) with n associated degrees of freedom, and the associated probability p of obtaining the statistic under the null hypothesis.
In addition to all the quantities in the set Q ψ , the LME and likelihood ratio test was also performed for Q, to find the significant interaction effects of the parameters. Here, we also modeled Q as dependent on a fixed effect in F as in Eq. (7), but excluding the parameter ψ as it was already implicitly considered. Once the significant effects of memory length, level of conscious arousal, and channel location were characterised with our statistical analysis, we followed with post hoc, paired t-tests for elements in Q ψ given by where s f is the standard deviation of Q f , and | f | = 13 is the sample size. The paired t tests examine the nature of interactions between the parameters on a given quantity over the two states of conscious arousal. Positive t-scores indicate a quantity is larger for the wakeful state. We present the results of these analyses in the following sections, sorted categorically by whether time-direction is considered.

Forward-time complexity results
To observe the effects of isoflurane on neural complexity, we began by visually inspecting the structure of the reconstructed machines for the two levels of conscious arousal for the forward-time direction. We took special interest in observing the differences in the characteristics of the two groups of machines heralding the two levels of conscious arousal. Here, memory length λ plays an important role. At a given λ, the maximum number of causal states that may be generated scales according to |A| λ [25]. In our case, the alphabet is binary, A = {0, 1}. This greatly restricts the space of machine configurations available for short history lengths [54]. For λ = 2 we can observe up to four distinct configurations, which is unlikely to reveal the difference based on conscious states. Given the previous findings [30], we generally expected that the data from the wakeful state would present more complexity than those from the anesthetized state.
Visual inspection of the directed graphs indeed suggested higher machine complexity during the wakeful state compared to the anesthetized state, at a given set of parameters { f , c, λ}. In particular, the data from the anesthetized state tended to result in fewer causal states and overall reduced graph connectivity. Figures 1(c) and 1(d) are examples of machines (channel 1 data recorded from fly 1, at maximum memory length λ = 3), where a simpler machine is derived from the data under the anesthetized condition. Differentiating between two conscious arousal states by visual inspection quickly becomes impractical because of the large number of machines. Moreover, for large values of λ the number of causal states is exponentially large and it becomes difficult to see the difference in two graphs. To overcome these challenges, we looked at a simpler index, the statistical complexity C μ , to differentiate between conscious arousal states. To systematically determine the relationships between C μ and the set of variables {c, f , ψ} we employed the LME analysis outlined in Sec. III A. We first tested whether λ significantly affects C μ . We found λ to indeed have a significant effect on C μ (λ, χ 2 (1) = 443.64, p < 10 −16 ). Figure 2 shows that independent of the conscious arousal condition or channel location, C μ increases with larger λ. This indicates that the Markov order of the neural data is much larger than the largest memory length (λ = 11) we consider. Nevertheless, we have enough information to work with.
We then sought to confirm if the complexity of machines during anesthesia are reduced, as suggested from visual inspection. Our statistical analysis indicates that C μ is not invariably reduced during anesthesia (ψ, χ 2 (1) = 0.212, p = 0.645) at all levels of λ and all channel locations. This means that C μ cannot simply indicate the causal arousal state without some additional information about time (λ) or space (c). In addition, we found that neither c alone nor cψ strongly affects C μ . However, we found significant reductions in complexity when either the level of conscious arousal or the channel location, interacted with memory length (λψ, χ 2 (1) = 14.63, p = 1.31 × 10 −4 ) and (cλ, χ 2 (14) = 42.876, p = 8.97 × 10 −5 ), respectively. Moreover, the three-way interaction also had a strong effect (λψc, χ 2 (14) = 24.00, p = 0.0458).
As the three-way interaction between λ, ψ, and c complicates interpretation of their effects, we performed a second LME analysis where we modelled C μ instead of C μ , thus accounting for ψ implicitly. In doing so, we investigated whether the change in statistical complexity due to anesthesia is affected by memory length λ or channel location c. Using this model, we found a nonsignificant effect of c on C μ , while a significant effect of λ on C μ was seen (λ, χ 2 (1) = 20.97, p = 4.65 × 10 −6 ), indicating that C μ overall changes with λ. Specifically, C μ tended to increase with larger λ when ignoring channel location, as is evident in Fig. 3 (top). Further, explaining our previous interaction between λ and ψ, C μ was not clearly larger than 0 for small memory length (λ = 2; the top panel of Fig. 3). This suggests that the information to differentiate between states of conscious arousal is contained in higher order correlations. We also found that the interaction between λ and channel location has a significant effect on C μ (λc, χ 2 (14) = 37.19, p = 6.90 × 10 −4 ), indicating that the effect of λ is not constant across channels. Given that C μ overall increases with λ, we considered that that the largest C μ should occur at the largest λ. Figure 3 (bottom) examines C μ across channels at λ = 11.
To further break down the interaction between λ and c, we performed a one sample t test at each value of memory length and channel location to find regions in the parameter space (λ, c) where C μ reliably differentiates wakefulness from anesthesia across flies. We plot the t statistic at each parameter combination in Fig. 4(a), outlining regions in the parameter space where C μ is significantly greater than 0 (with p < 0.05, uncorrected, two-tailed), finding that the majority of the significance map is directed toward positive values of the t statistic. However, only a subset of (λ, c) cells contain values which are significantly different from 0. Interestingly, we observed that for λ = 2, C μ is actually significantly negative, corresponding to greater complexity during anesthesia, not during wakefulness. This marks λ = 2 as anomalous relative to other levels of λ, and this reversal of the direction of the effect of anesthesia likely contributed to the interaction between λ and ψ.
Disregarding λ = 2, we find C μ to be significantly greater than 0 for channels 1, 3, 5-7, 9, 10, and 13, at varying levels of λ. As expected from our reported interaction between λ and c, we observe C μ to already be significantly greater than 0 at small λ for channels 5-7, while C μ only became significantly greater at larger λ for channels 1, 3, 10, and 13. Further, other channels such as the most peripheral channel (c = 15) did not have C μ significantly greater than 0 at any λ. All significance results, due to LME tests, are reported in Table I.
The above results suggest that the measured difference in complexity is present across various brain regions [ Fig. 4(a)], and that it grows as longer temporal correlations are taken into account (up to the largest value λ = 11 tested). While Fig. 3 shows a continued increase in the difference of statistical complexity, C μ , as a function of λ, we did not analyze longer history lengths, due to limitations in the amount of the data and stability of the estimation of C μ . In addition to this general observation of increasing C μ with λ, we observe that, remarkably, some brain regions discriminate the conscious arousal states with a history length of only 3. One trivial explanation for this effect is that under anesthesia, the required memory length is indeed λ = 2, while the optimal λ for awake is much larger. However, a quick observation of Fig. 2 rules out this simple possibility; under both wakeful and anesthetized states, C μ continues to increase.
It is likely, however, that the tested range for λ remains below the Markov order of the neural data; this is clearly indicated by the lack of a plateau in statistical complexity in Fig. 3. This suggests that we are far from saturating the Markov order of the process, and with more data we would be able to further distinguish between the two states. Future analyses with longer time series would also contribute to our understanding of the Markov order (maximum memory length) differences between the two states of conscious arousal. Nevertheless, our results, in Figs. 3 and 4(a), demonstrate that saturation of Markov order is not required for discrimination between conscious arousal states. This finding has a practical implication about the empirical utility of machines; even if the history length is too low, the inferred machine and FIG. 4. Colour map of two-tailed paired t scores over channel location and memory length λ for (a) statistical complexity differences The dotted lines indicate the memory length and channel locations that exceed p < 0.05 (uncorrected). The color scale is consistent across all subplots.
its statistical complexity can be useful. We now discuss the temporal asymmetry of neural processes.

Temporal asymmetry
Unlike other complexity measures, we obtain a distinct machine from each given time series, and for each direction we read the time series, i.e., forward or backward in time. Based on the notion that wakeful brains should be better at predicting the next sensory input [30], we expect that anesthesia should alter the information structures depending on the time direction. Our expectation translates to the following three hypotheses: , which is purely based on the summary measure of statistical complexity, should be higher for awake but lower for anesthetized brains; (2) Crypticity (d := 2C ± μ − C + μ − C − μ ) should be higher for wakeful than anesthetized brains; (3) Symmetric KL divergence rate (D KLS ) should behave similarly.
On visual inspection of the variation in for the wakeful and anesthetized conditions, both appeared close to zero, suggesting that would not have significant dependence on the condition. This impression was confirmed statistically with two-tailed t tests against zero with corrections for TABLE I. Significant χ 2 and p values of effects of channel c, memory length λ, and condition ψ, for informational quantities Q obtained via LME analysis. First-order effects correspond to significant channel, memory, or condition responses on an informational quantity, while second-and third-order effects correspond to interactions between these effects. χ 2 values are reported with n − 1 degrees of freedom in the parentheses, corresponding to the number of effects removed under the null model, described in Sec. III A. multiple comparisons, as shown in Fig. 4(b). Thus, Hypothesis 1 above, that irreversibility should be higher for wakeful over anesthetized brains, is not supported by the data. However, as mentioned earlier, vanishing does not imply that either d = 0 or D KLS = 0. To rule out the possibility that the information structure of machines are different when read forward, as opposed to backward, depending on the condition, we also tested the latter two hypotheses.
With respect to crypticity, first, visual inspection of the two-tailed t-score map, which compares crypticity for the wakeful d w and anesthetized d a conditions [ Fig. 4(c)] strongly implies that crypticity is larger in the former compared to the latter. This difference is largest over channels 5-7 and 9. To systematically evaluate this impression, we used LME statistical analysis (described in Sec. III A) to determine the relationships between crypticity, d, and the set of variables {c, λ, ψ} we employ. As expected, we found that both memory length (λ) and level of conscious arousal (ψ) significantly affected crypticity (λ, χ 2 (1) = 470.5, p < 10 −16 ) and (ψ, χ 2 (1) = 5.896, p = 0.0152), respectively. Crypticity also depended on a significant interaction between memory length and condition (λψ, χ 2 (1) = 6.119, p = 0.0134). Specific increases in crypticity around the middle brain region [ Fig. 4(c)] were also evident, with a strong interaction between channel location and memory length [λc, χ 2 (14) = 35.86, p = 1.09 × 10 −3 ], which is similar to the result obtained for C μ . This LME analysis, together with the direction of effects in Fig. 4(c) strongly confirms our Hypothesis 2.
Furthermore, as a more direct measure of microscopic structure, we also analyzed the symmetric KL divergence rate, D KLS . Again, the two-tailed t-score map [ Fig. 4(d)] showed support for our hypothesis. Our formal statistical analysis with LME confirmed a critical interaction between memory length and condition (λψ, χ 2 (1) = 15.37, p < 10 −16 ), meaning that time-asymmetric information structure is lost due to anesthesia, especially when a long memory length is taken into account. We also note other significant effects: mainly the effect of memory length (λ, χ 2 (1) = 127.4, p < 10 −16 ) and interaction between memory length and channel location (λc, χ 2 (14) = 85.81, p < 10 −16 ). Again, all significant results, due to LME tests, are reported in Table I. Taken together, these results show that the relative complexity of the forward versus reverse direction, as measured by causal irreversibility, does not distinguish between the wakeful and anesthetized states. However, our crypticity results demonstrate that, under anesthesia, the structures of the forward and reverse processes are relatively similar, whereas during wakefulness their structures differ. Figure 5 demonstrates this effect with exemplar machines reconstructed from a representative channel, from which we derived six distinct machines: three for wakeful (a, c, d), and three for anesthetized (e-g) flies. Figure 5(b) shows how the time series and the transitions in the causal states of forward, reversed, and bidirectional machines are related.
Our finding, that causal irreversibilities were not above zero for wakeful brains, corresponds to the fact that complexities of forward and reverse machines were not significantly different. However, the bidirectional machines for the wakeful condition were substantially more complex than those for the anesthetized condition. The statistical complexity of bidirectional machines should equal that of forward or reverse machines if the process is completely time symmetric and deterministic [33], resulting in zero crypticity. However, for nondeterministic processes, additional information for synchronizing the forward and reversed process may be needed, which would mean d > 0. For instance, in Figs. 5(e) and 5(f), if we are told that the forward machine is in causal state A, then we need extra information to determine whether the reversed machine is in causal state X or Y . Yet, the detailed structure of the forward and reverse machines are the same in this example. Our analysis is supplemented with a study of the symmetric KL divergence rate between forward and backward processes, which measures the distance between the reconstructed machines. In other words, crypticity and symmetric KL divergence rate quantify two different notions of temporal asymmetry; the former is information theoretic, and the latter is information-geometric. Indeed, in general we find that the processes in the two directions are different in both ways and, further, their difference varies significantly between conditions, as shown in Figs. 4(c) and 4(d).

IV. DISCUSSION
Discovering a reliable measure of conscious arousal in animals and humans remains one of the major outstanding challenges of neuroscience. The present study addresses this challenge by connecting a complexity measure to the degree of conscious arousal, taking a step forward to strengthening the link between physics, complexity science, and neuroscience. Here, we have taken tools from the former and have applied them to a problem in the latter. Namely, we have studied the statistical complexity and time asymmetry of neural recordings in the brains of flies over two states of conscious arousal: awake and anesthetized. We have demonstrated that differences between these macroscopic states can be revealed by both the statistical complexity of local electrical fluctuations in various brain regions, and various measures of temporal asymmetry of hidden models that explain their behavior. Specifically, we have analyzed the single-channel signals from electrodes embedded in the brain using the machine formalism and quantified the statistical complexity C μ , causal and microscopic reversibility & D KLS , and crypticity d of the recorded data for 15 channels in 13 flies over two states of conscious arousal. We find the statistical complexity is larger on average when a fly is awake than when the same fly is anesthetized [ C μ > 0; Figs. 3 and 4(a)], and that the structural complexity of information and its time reversibility captured by crypticity and KL rate are also reduced under anesthesia [ d > 0 and D KLS > 0; Fig. 4(d)].
As we have demonstrated in this study, the local information contained within a single channel can contain information about global conscious states, which are believed to arise from interactions among many neurons. Theoretically, single channels can reflect the complexity of the multiple channels due to the concept of Sugihara causality [55]. This arises due to any one region of the brain causally interacting with the rest of the brain, making the temporal correlation in a single channel time series contain information about the spatial correlations, i.e., information that would be contained in multiple channels. With this logic, Ref. [56] infers the complexity of the multichannel interactions from a single channel temporal structure of the time series. This is often known as the backflow of information in non-Markovian dynamics [57]. The periodic structure of statistical complexity observed across channels in Fig. 2, demonstrates an unexpected example of spatial effects present in our study-one that was not observed with conventional LFP analyses. This observation may provide a motivation for multichannel analyses. While we already find differences between conscious states in the single channel based machine analysis, it would be beneficial to extend the present analysis to the multichannel scenario, in which machine can be contrasted with the methods of IIT [9][10][11][12][13][14][15][16][17][18]. Formal comparison of the distinguishing power of conscious states among proposed methods (such as those in Refs. [6,7]) will contribute to refining models and theories of consciousness.
Our results can be informally compared with a previous study, where the power spectra of the same data in the frequency domain [30] was analyzed. There, a principal observation was the power in low-frequency signals in central and peripheral regions, which was more pronounced in the central region (corresponding to channels 1-6 in this study). Our machine analysis here reveals that the region between periphery and center (channels 5-7) shows most consistent difference in C μ across history length λ > 2. Ultimately, the reason for this difference is due to our distinct approach, insofar as machines are provably the optimal predictive models of a large class of time series that take into account higher order correlations memory structure [20,21]. Thus, our application of machines contrasts with the power spectra analysis, by considering these higher order correlations for very high-frequency signals, instead of only two-point correlations in both high-and low-frequency signals. Finally, Fig. 4(a) shows that in regions corresponding to channels 1 and 13, the differences in the conditions are only seen at high values of λ.
Our multitime analysis further reveals an interesting effect when we look more closely at, e.g., the anesthetized machine example shown in Fig. 1(c). When we examine the binary strings belonging to each causal state, we find a clear split between active (consecutive strings of ones) and inactive (consecutive strings of zeros) neural behavior corresponding to the left-and right-hand sides of Fig. 6, respectively. Previous studies have demonstrated an increase in low-frequency LFP and EEG power for mammals and birds during sleep and anesthesia, mediated by similar neural states of activity and inactivity known as "up" and "down" states [58,59]. A similar phenomenon has recently been observed in sleep deprived flies [60]. Consistent with other studies, our study, using general anesthesia, does not observe this slow oscillations. Future studies with more formal comparisons between up and down states and machines, in both theory and computer simulations, may be a fruitful avenue for further research in this regard.
An analysis in terms of machines has also allowed us to discriminate between levels of conscious arousal by examining causal structures found in both forward and reverse time directions. Based on our previous finding [30] as well as related concepts in temporal predictive [31,[61][62][63] and causal matching [32], we hypothesized that the wakeful brain may be tuned to causal structures of the world, which run forward in time, and thus machines would be more complex for forward than reverse readings. Further, we hypothesized that such temporally tuned structural matching will be lost under anesthesia. Our results (Sec. III B 2) are highly intriguing in three ways. First, near-zero causal irreversibility indicates that reducing the structural complexity to a simple index is not enough to capture effects on the information structure that are sensitive to the direction of time. This is the case regardless of the level of consciousness (at least at the timescales of this study). Second, nonzero crypticity indicates that the underlying information structure is not symmetric in time. More precisely, the signals themselves encode different amounts of information when run forward as opposed to backward. Third, the KL divergence rate analysis definitively demonstrated the existence of greater temporal irreversibility in the wakeful as opposed to the anesthetized state. Having said this, we are limited in drawing strong conclusions due, in part, to the relatively small observed effect size of , likely a consequence of our relatively small data set. Despite this, even at millisecond timescales, our study successfully identifies significant differences in the time direction of the neural recordings.
Identifying the decrease in temporal-reversibility due to anesthesia in tandem with complexity is of broad interest in neuroscience. While some physicists and neuroscientists have conjectured links among physics, the brain, and even consciousness through the lens of the direction of the time, their accounts have remained rather speculative and not built on any solid theoretical foundations (for related and alternative theoretical foundations, see the work by Cofré and colleagues [64,65]). For example, using reversely played movies, the sensitivity to the direction of time is shown to differ across brain regions in humans [66]. In animal studies, some populations of neurons (in the hippocampus) are known to become activated in a particular sequential order while the animal experiences a particular event. For example, in anticipation of the event, the neurons activate in a forward direction, but in retrospection, the neurons activate in reverse order [67]. While direct links between these empirical findings and the machine framework remains elusive, we foresee that our unified theoretical and analytical framework can potentially bridge this gap in the future.
Our study is not the first to apply complexity measure in consciousness research. Indeed, many definitions and measures of complexity have been proposed in the literature (see Ref. [68] for a list). Moreover, there is a flow of ideas going the other way as well [69][70][71]. However, many, if not most, of these measures cannot account for temporal correlations (memory), temporal asymmetry, or differentiate between random and structured processes. Our interdisciplinary study, based on machines, opens up new possibilities; physics can improve its theoretical constructs through the application of tools to empirical data, while neuroscience can benefit from rigorous quantitative tools that have proven their physical basis across different spatio-temporal scales. Among those complexity measures, C μ can be easily interpreted in terms of temporal structure [72], as it has a direct relation to process predictability and memory requirements. We emphasise that statistical complexity C μ derived from machines, drastically differentiates itself from other scalar complexity indices such as Lempel-Ziv complexity [73]. For one Lempel-Ziv complexity is maximal for a random noise process whereas statistical complexity for the same process is zero [see Eq. (2)]. In addition, the notion of temporal reversibility available in the machine framework has no counterpart in Lempel-Ziv complexity. This is a critical difference since it is known that a low-complexity forward-time machine consisting of only two causal states can have a very high-complexity reversetime machine with countably infinite states [74]. Thus, explicitly considering the influence of time is critical for addressing questions about complexity. When coupled with our results, we can conclude that anesthetized brains become less structured, more random, more reversible, and approach a stochastic process with a smaller memory capacity compared to the wakeful brains.
Overall, our results suggest that measures of complexity extracted from machines might be able to identify further structures that are affected by anesthesia at different spatial and temporal scales. It is also likely that applying a similar analysis to other data sets, in particular, human EEG data will lead to new discoveries regarding the relationship between consciousness and complexity that can be retrieved simply at the single channel level.

APPENDIX: LINEAR MIXED-EFFECTS MODEL
In this Appendix, we demonstrate an example of an LME analysis for the case of statistical complexity C μ in the forward time direction. For the case when time direction ξ is included as an effect, the only change this makes to the process is increasing the dimensions of the effects matrix F. Performing an LME analysis on other quantities used in this study like crypticity or KL rate follow the same procedure outlined here.
The main goal of the LME analysis we perform in this study is to determine the degree of contributions each and combinations of memory length (λ), channel location (c), and level of conscious arousal (ψ) have on statistical complexity C μ . LME accomplishes this by modeling statistical complexity as a general linear regression equation [Eq. (7)], whose response is predicted by the aforementioned parameters λ, c, and ψ. In this Appendix, we show the exact form of the linear regression equation used in this analysis, while referring to the terminology introduced in the methods (Sec. III A).
We begin by restating Eq. (7) for the case of statistical complexity, C = Fβ + Rb + E, which has the form of a general multidimensional linear equation. We will set aside the right hand side of the equality for now. On the left hand side, statistical complexity takes the form of a column vector C. Each row corresponds to the unique response of C μ , at a specific selection of parameters. There is a general freedom of choice associated with the number of parameters one would like to assign to the elements C. We index the rows with fly number f , channel location c, and the conscious arousal state ψ. That is, the (i, j, k)th element is (A1) In other words, it is the ith fly's jth channel in kth condition. Thus, C has length of | f | × |c| × |ψ| = 390. Each C μ in this vector is a function of λ.
The matrix F introducing the set of fixed effects F = {λ, c, ψ, λc, λψ, cψ, λcψ} into the model (known in the context of general linear models as the design matrix) can then be represented as F = (F 1 , . . . , F 13 ) T , with each element corresponding to the design matrix of a specific fly. These individual fly response matrices can be explicitly expressed as In a similar fashion, the expression for the matrix containing the random effects R can be determined. For the case of our study, we only consider random effects arising due to correlations between channels within a specific fly. The result of this is an adjustment to the intercept of the linear model for each fly and channel combination. Therefore, the random effects matrix R is simply an identity matrix of dimension 390. The accompanying elements of the random effects vector b consist of regression coefficients b f c describing the strength of each intercept adjustment.
The execution of the LME analysis which included coefficient fitting, and log-likelihood estimations was facilitated by running fitlme.m in MATLAB R2108b.