Thermodynamic inference in partially accessible Markov networks: A unifying perspective from transition-based waiting time distributions

The inference of thermodynamic quantities from the description of an only partially accessible physical system is a central challenge in stochastic thermodynamics. A common approach is coarse-graining, which maps the dynamics of such a system to a reduced effective one. While coarse-graining states of the system into compound ones is a well studied concept, recent evidence hints at a complementary description by considering observable transitions and waiting times. In this work, we consider waiting time distributions between two consecutive transitions of a partially observable Markov network. We formulate an entropy estimator using their ratios to quantify irreversibility. Depending on the complexity of the underlying network, we formulate criteria to infer whether the entropy estimator recovers the full physical entropy production or whether it just provides a lower bound that improves on established results. This conceptual approach, which is based on the irreversibility of underlying cycles, additionally enables us to derive estimators for the topology of the network, i.e., the presence of a hidden cycle, its number of states and its driving affinity. Adopting an equivalent semi-Markov description, our results can be condensed into a fluctuation theorem for the corresponding semi-Markov process. This mathematical perspective provides a unifying framework for the entropy estimators considered here and established earlier ones. The crucial role of the correct version of time-reversal helps to clarify a recent debate on the meaning of formal versus physical irreversibility. Extensive numerical calculations based on a direct evaluation of waiting-time distributions illustrate our exact results and provide an estimate on the quality of the bounds for affinities of hidden cycles.

There is, however, a difference between identifying an effective description of a complex system and actually having full access to it in practice. On the arguably coarsest level of description, one is interested in estimation methods of crucial quantities like the entropy production. As a prominent result, the thermodynamic uncertainty relation (TUR) [23][24][25] provides thermodynamic bounds that can be used in estimation techniques for entropy [26][27][28][29][30][31] or topology [32,33] if it is possible to measure currents of the underlying system. These currents are a trace of the fundamental time-reversal asymmetry in dissipative systems [34,35] that can also be utilized directly as an entropy estimator [36][37][38]. Furthermore, entropy estimators that incorporate or are even based on waiting times between measurable events have been discussed more recently [39][40][41][42]. For a partially visible Markov network, entropy production can be estimated through the fraction that is visible in the subsystem through passive observation [43] or by controlling adjustable parameters [44,45].
These methods raise the general issue how an underlying, only partially accessible system is related to a reduced effective model, a topic known as coarse-graining in stochastic thermodynamics. Earlier interest in the field mainly considered coarse-graining as a mapping in which unresolved Markov states are lumped into compound states, for example via schemes described in Refs. [46][47][48][49][50]. In general, the resulting system is no longer Markovian, so that a description of the dynamics or the entropy production is formulated in terms of phenomenological, apparent equations [27,[51][52][53][54][55]. While particular symmetric systems can be described as semi-Markov processes in this coarse-graining approach [56][57][58], a general framework to describe situations with incomplete information remains an open issue. To give a recent example [59,60], allowing states that are not contained in any compound state breaks with the well studied paradigm of state lumping as coarse-graining scheme. This novel scheme extends our ability to formulate thermodynamically consistent models while also exhibiting new effects such as kinetic hysteresis that require a refined understanding of the relationship between time-reversal and coarse-graining.
In this work, we discuss thermodynamic inference based on the observation of a few transitions and their waiting time distributions, rather than on the observation of a few states. This strategy has been proposed independently in the very recent Ref. [61] where the corresponding estimator for entropy production has been introduced and its properties derived using mainly arXiv:2203.12020v2 [cond-mat.stat-mech] 18 Aug 2022 concepts from information theory, in particular, the Kullback-Leibler divergence. In our complementary approach that is based on the analysis of cycles, we will show that the underlying trajectory-dependent quantity obeys a fluctuation theorem. Our analysis reveals that this estimator is the entropy production of a semi-Markov process. In particular, we will show that the description discussed in the present work and in Ref. [61] shows kinetic hysteresis [59]. Mathematically, this effect is the consequence of a time-reversal operation that differs from the one that is usually employed for semi-Markov processes. In this context, higher-order semi-Markov processes [39] fit into the picture naturally as semi-Markov processes with yet another time-reversal operation. Thus, our mathematical perspective establishes semi-Markov processes as an underlying common model while also highlighting the subtleties involved in identifying the correct time-reversal operation.
Thermodynamic inference is not limited to estimating entropy production. We show that the waiting time distributions allow us to infer topological properties and further thermodynamic quantities like the number of states in cycles and their driving affinity. Furthermore, we propose an inductive scheme to detect the presence of hidden cycles in a complex network.
The paper is structured as follows. In section II, we describe the setup and present our key results qualitatively. The fundamental concepts of our effective description are introduced in section III for the paradigmatic model of a single observed link in a unicyclic Markov network. By generalizing these concepts to multicyclic Markov networks in section IV, we propose and discuss an entropy estimator and inference methods theoretically and numerically. The general framework of multiple observed links in a multicyclic Markov network is discussed in section V. In section VI, we discuss our and related work from the perspective of semi-Markov processes. We conclude with a summary and an outlook on further work in section VII.

II. SETUP AND KEY QUALITATIVE RESULTS
We start with a general Markov network of N interconnected states, e.g., the one shown in Fig. 1(a). At time t, a state i(t) = k is assigned to the physical system, with k = 1, ..., N . The time-evolution follows a stochastic description by allowing transitions between two states k and l that are connected by a link, equivalently, an edge, in the network. Quantitatively, these transitions from k to l and their reverse happen instantaneously with transition rates k kl and k lk , respectively. We assume that k kl > 0 implies k lk > 0 to ensure thermodynamic consistency. In the long-time limit t → ∞, the probability p k (t) to observe the system in a particular state k at time t approaches a constant value p s k , which characterizes the stationary state of the network.
In general networks, it is possible to walk along closed loops. These are accessed systematically from the network by identifying its cycles C, which are defined as closed, directed loops without self-crossings. From a thermodynamic perspective, cycles are a crucial concept due to their possibility to break time-reversal symmetry by favoring the forward direction over the reverse or vice versa. This preference is quantified by the cycle affinity A C , defined as the product over all forward rates in C divided by the corresponding backward rates, As shown in Fig. 1(b), the network from Fig. 1(a) has three different cycles with different affinities. The affinity A C is also related to the entropy production associated with the cycle C [62,63]. For biochemical reactions or driving along a periodic track by a force, the affinity is given by the free energy change or dissipated work, respectively [3].
Cycles C with non-vanishing affinities give rise to macroscopic, sustained flows along their constituent links, even in the limit of large observation times T . These circular flows are the cause of the mean entropy production rate where j C is the expected net number of completed cycles 1 C divided by the observation time T in the limit T → ∞ [63,64]. If σ > 0, there is a constant rate of dissipation in the stationary state, which is then referred to as a non-equilibrium stationary state (NESS). Calculating the entropy production via Eq. (2) requires the ideal case of knowing all cycles and all cycle currents, which is not practically feasible in general. In our setup, we assume that an external observer measures individual transitions along a limited number of edges connecting neighboring states in the Markov network. Conceptually, this approach coincides with the transition-based effective description proposed in Ref. [61]. Notationally, we discern transitions from states by utilizing capital letters I, J, ... and write I = (kl) to express that I is a transition from the Markov state k to the Markov state l. An example illustrating this effective description for observable transitions (23) and (32) in the Markov network from Fig. 1(a) is shown in Fig. 1 (1), the affinity of C 0 is given by A C0 = ln(k 12 k 23 k 34 k 41 /k 21 k 32 k 43 k 14 ), A C1 and A C2 are defined analogously. Furthermore, these affinities coincide with A C = ln P( )/P( ), the quotient of probabilities to observe a completed cycle in forward and backward direction, respectively (cf. Eq. (7)). which quantify the probability density that the transition J is measured at time T J = T I + t given that the previous transition I was registered at time T I . With transitions I, J replacing states k, l, waiting time distributions ψ I→J (t) are the time-resolved analogue of transition rates k kl . Fig. 1(e), (f) and (g) illustrate the concept of waiting time distributions for the effective description in Fig. 1(c) and (d). In the following, we will derive several remarkable results centered around these waiting time distributions and their underlying semi-Markov description, which are summarized here on a qualitative level.
1. For a unicyclic network, it is sufficient to determine the ψ I→J (t) from just one edge in order to infer the affinity of the cycle C and the exact mean entropy production rate σ from the ratio of these distributions. We recover this result of Ref. [61] independently, here based on a microscopic fluctuation theorem from the perspective of network cycles. Since the full entropy production is inferred by this estimator, it beats the TUR, which, in general, does not recover the full entropy production even in a unicycle.
2. For a multicyclic network, the same information from just one edge yields the affinity of the shortest cycle, its length and the length of the secondshortest cycle this edge is a part of. Second, it yields a lower bound on the largest cycle affinity contributing to the current through this edge. Finally, it provides a lower bound on the overall entropy production of the network that coincides with the bound proposed in Ref. [61]. This bound is shown to be tighter than the entropy estimator in Ref. [44] while also omitting any assumptions of physical control over system parameters at the observed edge.
3. If several edges can be observed, the estimator on total entropy production becomes successively tighter. Based on the ratios of the ψ I→J (t), we establish operational criteria to infer the presence of hidden cycles and hidden entropy production not accounted for by the estimator. 4. From a mathematical perspective, observing transitions results in a semi-Markov process. The cyclebased approach of this work and the informationtheoretical approach of Ref. [61] can be seen as equivalent strategies to establish the entropy production of the corresponding semi-Markov process. From this point of view, we relate the proposed entropy estimator to the semi-Markov entropy estimator proposed and discussed in Refs. [39,65,66] and highlight the crucial role of the different timereversal operations.

III. UNICYCLIC NETWORK AS PARADIGM
For an introductory example, we consider a Markov network with only a single cycle C in its NESS. In this network, we observe a single edge between neighboring states k and l that is part of the cycle. We assume that forward and backward transitions along this edge can be distinguished and denote forward transitions (kl) as I + and backward transitions (lk) by I − , respectively.
On the microscopic level, a waiting time distribution of the form ψ I→J (t) has contributions only from microscopic trajectories γ t I→J that start with a transition I and end with another one, J, after time t without any other observed transition in between. With a microscopic path weight P[γ] for microscopic trajectories γ, the waiting time distribution can be expressed as which only sums trajectory snippets of the form γ = γ t I→J with a path weight that is conditioned on the first jump I at time T I . For example, the waiting time distribution ψ I+→I+ (t) originates from a trajectory snippet γ t I+→I+ of length t with the jump sequence γ t I+→I+ = k → l → · · · → k → l. Likewise, ψ I−→I− (t) arises from γ t I−→I− = l → k → · · · → l → k. Although the identification in Eq. (4) is reasonable from a practical point of view, its derivation contains some subtleties that are explained in the full proof of Eq. (4) in appendix A. Since γ t I−→I− is the reverse of γ t I+→I+ , the logarithmic ratio of the corresponding waiting time distributions, is a natural, antisymmetric measure of irreversibility of the underlying trajectory. As a first main result, we will show that a(t) is independent of t, and, in particular, can be identified with the cycle affinity A C , This relation can be seen as a fluctuation theorem applied to sections of the underlying trajectory on the Markov network that give rise to a waiting time distribution ψ I+→I+ (t). These sections are trajectory snippets γ t I+→I+ of the form given above, where the time difference between both jumps k → l is exactly t. To observe the genuine time-reverse ψ I−→I− (t), the underlying trajectory must complete the cycle in the reverse direction, which means for the path weights of every possible trajectory snippet γ t I±→I± . Since this argument holds true for all trajectories contributing to the waiting time distribution ψ I+→I+ (t), we can sum the left side of Eq. (7) over all γ t I−→I− and the right side of Eq. (7) over all γ t I+→I+ to conclude using Eq. (4). Inserting Eq. (8) into Eq. (5) proves Eq. (6). Since a(t) = a = A C is time-independent, we get from Eq. (5) to with an integration over the time t. The last equality follows from the definition of ψ I→J (t) as a joint distribution in J and t in Eq. (3). Thus, the cycle affinity is encoded in conditional probabilities P (J|I) to observe transition J after transition I irrespective of the intermediate waiting time. The relationship between cycle affinities and a time-antisymmetric probability ratio, given by Eq. (6), or equivalently Eq. (9), indicates that a(t) can be used as an estimator for the mean entropy production rate σ in the steady state via which is exact even for finite observation times T , because the average is taken in the NESS. This non-invasive estimator is directly accessible from an operational point of view as by definition j C can be calculated by counting transitions along the observed link and a(t) = a can be calculated either directly from histogram data for the waiting time distributions using Eq. (5) or from conditional probabilities deduced from observed transitions using Eq. (9). This unicyclic result also recovers one of the main results in Ref. [61], here using a technique based on the microscopic cycle fluctuation theorem Eq. (7). Thus, the result additionally addresses the conceptual issue of relating entropy production, cycles and fluctuation theorems that was raised at the end of Ref. [61]. Conceptually, the identification A = a(t) relies crucially on the observation of transitions rather than states. Two subsequent transitions in the same direction imply a completed cycle with associated entropy production, whereas two visits of the same compound state emerging from state lumping in typical coarse-graining strategies do not. As all transitions except for one are invisible in the present partially accessible system, previous state-based coarse-graining approaches would yield a trivial model containing only a single compound state. Note that alternated observed transitions, observing a forward transition after a backward transition or vice versa, can never imply the completion of an underlying cycle. Therefore, it is not surprising that the estimator of the entropy production of a unicyclic network contains only the statistics of two subsequent transitions in the same direction, as observed in Ref. [61].

IV. MULTICYCLIC NETWORKS WITH ONE OBSERVED TRANSITION
For a general network topology, we cannot reconstruct a unique underlying path contributing to the waiting time distributions ψ I+→I+ (t) and ψ I−→I− (t) as in the unicyclic case. Topologically distinct hidden pathways may result in the same pair of consecutive observed transitions. Nevertheless, bounds for the affinities of those cycles that include the observable link can be derived from the ratio a(t). In addition, the cycle lengths of specific cycles can be inferred from the short-time limit of the waiting time distributions. Furthermore, the entropy estimator for unicyclic networks can be generalized to the multicyclic case.

A. Bounds on cycle affinities
For each possible underlying cycle C with I + ∈ C, Eq. (7) is valid with corresponding cycle affinity A C , if γ t I+→I+ completes the cycle once in forward direction without taking detours and γ t I−→I− denotes the corresponding reverse path. Thus, the bound min C,I+∈C is an immediate consequence for these trajectories γ t I+→I+ by comparing with the smallest and largest possible affinity, respectively. Remarkably, the inequality in (7) holds true for general γ t I+→I+ , if the corresponding γ t I−→I− is defined appropriately by the following algorithm: 1. Consider the sequence of states in γ t I+→I+ . For I + = (kl), this is (kl · · · kl).
4. The remaining trimmed path visits each state at most once. This trimmed path completed with I + gives rise to a contributing cycle.
This procedure identifies a trimmed path of γ t I+→I+ that visits each state at most once. By reversing only this trimmed path, one obtains the partial reverse of γ t I+→I+ , which is denoted by Rγ t I+→I+ . The associated cycle containing the transition I + that is reversed by R has to be one of the possible C in Eq. (11). For an example of this procedure, see Fig. 2(b). Thus, inverting only the trimmed part of γ t I+→I+ while maintaining the original direction of the remaining transitions restores the inequality in (7) and hence also the bound in (11) for every possible microscopic trajectory γ t I+→I+ with the corresponding partner γ t I−→I− = Rγ t I+→I+ defined in this way.
By averaging over all possible trajectory snippets of length t, we can combine Eq. (4) with Eq. (11), which is now valid for all γ t I+→I+ with corresponding partner γ t I−→I− to conclude min C,I+∈C for arbitrary 0 < t < ∞. For this step, it is important to note that the algorithm provides a bijective mapping R between trajectories of the form γ t I+→I+ and trajectories of the form γ t I−→I− . The inverse mapping is given by applying the same algorithm to γ t I−→I− except for reading right-to-left in step 3 to recover the correct sequence of states for γ t I+→I+ . The quotient in Eq. (12) can be identified as a(t) via Eq. (5). Thus, the extremal values of a(t) can be identi- fied as bounds on the actual cycle affinity in the form Here, maximum and minimum of the affinities are taken over all cycles C contributing to the observed link. Strong driving along or against the observed link manifests itself in a high positive or negative affinity for a given cycle, respectively. The inequalities (13) and (14) allow us to infer such a source of strong driving from its impact on a(t) from the viewpoint of the observed link. The derived bounds for the cycle affinities are illustrated in Fig. 2. Fig. 2(c) and (d) show that the extremal affinities A C+ and A C− of the contributing cycles are indeed bounded by the maximum value a * + and the minimum value a * − of a(t). Furthermore, the affinity A C0 of the shortest contributing cycle is always equal to the initial value a * 0 ≡ a(t = 0) as we will prove in the following section.
To quantify the quality of the bounds in (13) and (14) for the network from Fig. 2(a), we distinguish two different cases of network realizations. A network with a particular configuration of transition rates belongs to class I if the initial value a * 0 of a(t) is a global maximum or minimum. An exemplary a(t) of a realization of the network belonging to this class is shown in Fig. 3(a), case (I). For this class of network realizations, Eq. (13) and Eq. (14) only provide a single bound either for the maximal or the minimal affinity of the cycles contributing to the observed link. The other bound is satisfied by the shortest cycle with affinity a * 0 = A C0 . Class II contains the remaining realizations of the network in which a * 0 = a(t = 0) is not the global maximum or minimum. An example for an a(t) sorted into class II is given in Fig. 3(a), case (II); another one was already shown in Fig. 2(c) and (d). For this class of network realizations, equations Eq. (13) and Eq. (14) provide bounds for both the maximal and the minimal affinity of the cycles contributing to the observed link, respectively.
For both classes of rate configurations, quality factors Q can be defined such that for Q = 1 equality in Eq. (13) and Eq. (14) holds and the value of the bounds equals the actual affinity of the cycle. For Q < 1, the quality factor quantifies the ratio between the value of the bounds and the actual affinity of the corresponding cycle. Using the affinity A C0 of the shortest cycle given by a * 0 as baseline, we introduce the relative distance The quality factors are defined by comparing the maximal value and the minimal value of Eq. (15) with the respective actual distance between the true cycle affinities given by |A C± − A C0 |. For network realizations belonging to class I, either Eq. (13) or Eq. (14) is a bound for the affinity of a single cycle. If the initial value a * 0 is a global minimum, the maximal affinity A C+ of the cycles contributing to the observed link is bounded by Eq. (13). Thus, the quality factor Q I for this network realization is defined as If the initial value a * 0 is a global maximum, the minimal affinity A C− of the cycles contributing to the observed link is bounded by Eq. (14) and the quality factor Q I for this network realization is given by A graphical illustration of the quantities entering the definition of Q I is shown in Fig. 3(a), case (I).
For network configurations belonging to class II, both Eq. (13) and Eq. (14) provide nontrivial bounds for the extremal affinities of the contributing cycles. To distinguish both bounds, two quality factors Q + II and Q − II defined similarly to Eq. (18) and Eq. (19) are needed. The quality factor Q + II defined as quantities the quality of the bound Eq. (13) for the maximal affinity A C+ of the contributing cycles. The quality of the bound Eq. (14) for the minimal affinity A C− of the contributing cycles is quantified analogously by The quantities entering the definition of Q + II and Q − II are illustrated in Fig. 3(a), case (II).
The quality factors for a total of 2063495 randomly drawn realizations of the multicyclic network from Fig. 2 are shown in Fig. 3(b), (c) and (d) as a function of the affinityA C0 of the smallest contributing cycle. The different structure and mean value of quality factors Q I for network realizations from class I, shown in Fig. 3(b), when contrasted to the structures and mean values of quality factors for network realizations from class II, shown in Fig. 3(c) and (d), indicate that the partition into two different classes of network realizations corresponds to distinct features of the network that are reflected in these affinity bounds. The mean value of the quality factors for network realizations belonging to class I is given by Q I 0.4, which means that the maximal or minimal affinity of the contributing cycles can be estimated based on Eq. (13) or Eq. (14) with an average accuracy of 0.4. This result is remarkable because on the one hand, the estimation is based on a non-invasive observation of a single link of the network only and on the other hand, to our knowledge, no coarse-graining inference scheme exists that bounds affinities of a partially accessible network to this degree of precision. The mean values of the quality factors for network realizations belonging to class II are given by Q + II 0.2 and Q − II 0.1, respectively. Compared to the bounds for realizations belonging to class I, realizations belonging to class II tend to quantitatively weaker bounds. However, local maxima and minima of a(t) seem to provide further, loose bounds for the affinities of other, non-extremal cycles contributing to the observed link. This numerical finding, illustrated for a given network realization in Fig. 2(c), indicates that each successive maximal and minimal value of a(t) corresponds to a contributing cycle. Therefore, the number of successive maximal and minimal values of a(t) can be interpreted as a lower bound for the total number of contributing cycles for networks from class II.

B. Short-time limit and inference of cycle lengths
Additional information about the network can be obtained from the time-dependence of the waiting time distributions ψ I+→I+ (t) and ψ I−→I− (t). In the limit t → 0, only the shortest cycle(s) including the link with forward transition I + and backward transition I − contribute(s) to the waiting time distribution, as longer paths lead to effects of higher order in t. Thus, we can extract the number of hidden transitions N 1 needed to complete the smallest cycle and, if unique, its corresponding affinity A C0 from the waiting time distributions via and lim t→0 a respectively, as proven in appendix C 1. Note that N 1 +1 is equal to the length of the smallest cycle because after N 1 hidden transitions, an additional observed transition is needed to complete the full cycle. As an illustration for the identification of N 1 , we consider the ratio of waiting time distributions for the observable link of the twocycle network shown in Fig. 4(a). Fig. 4(b) illustrates that the evaluation of Eq. (22) for I + = (32) coincides with N 1 = 2, the minimal number of hidden transitions needed to observe (32) after (32) in the smallest cycle of the network. For the multicyclic network of Fig. 2, the identification of the affinity in Eq. (23) is illustrated in Fig. 2(c) together with the previously discussed affinity bounds, as the affinity A C0 of the shortest cycle is reflected in the initial value a (71)→(71) (0) = 0. Terms of higher order around t = 0 of the form t N encode similar information about cycles with increasing size contributing to the observable link. Qualitatively, we can extract information about the number of hidden transitions N 2 needed to complete the second shortest cycle from a(t), since More quantitatively and as proven in appendix C 2, the absolute value of the relative distance introduced in Eq. (15) can be seen as the lowest order perturbation to the shortest cycle. Typically, e.g., if the affinities of the two shortest cycles do not coincide, this effect is due to the second shortest cycle. In this case, N 2 can be extracted from Eq. (15) via , if the shortest cycle is unique. By combining the results from Eq. (22) and Eq. (25), we can infer N 2 from observable waiting time distributions. Similar to the length of the shortest network cycle, the length of the second shortest network cycle is given by N 2 + 1. Fig. 4(c) illustrates the evaluation of Eq. (25) for a(t) for I + = (32) leading to N 2 − N 1 = 1. This result is consistent with N 2 = 3, the number of hidden transitions needed to observe (32) as the next observable transition after (32) along the second smallest cycle of the network.
C. Entropy estimator

Definition
A time-dependent a(t) implies the presence of a second cycle as longer waiting times between subsequent transitions hint at the completion of longer pathways. Exploiting this time-dependence leads to an entropy estimator that generalizes the estimator of the unicyclic case. To quantify this notion, we let T be the length of a long trajectory with N +1 transitions I k located at T k−1 . The observation starts with the transition I 1 at T 0 = 0 and ends with I N +1 at time T N = T . Then, the number of subsequent forward or backward transitions with waiting time t in between is given by the time-resolved conditional jump counters defined as with ν −|− (t) defined accordingly. These time-resolved conditional jump counters are used together with the ratio of waiting times a(t) defined in Eq. (5) to define a trajectory-dependent entropy estimator Operationally, ν +|+ (t) and ν −|− (t) can be obtained from counting conditional transitions up to time t. a(t) can be obtained from histograms for the waiting time distributions based on waiting times between observed transitions. As proven in appendix B in the limit of long trajectories, i.e., observation times T → ∞, Eq. (27) defines an entropy estimator respecting time-reversal symmetry in thermodynamic equilibrium whose mean additionally satisfies This property can be deduced from a fluctuation theorem for the trajectory Γ and its time-reverse Γ, both emerging from trajectories of the underlying network by a mapping defined by the effective description of the system. An interpretation for Γ from a mathematical point of view will be given in section VI.

Illustration and comparison to existing methods
A numerical illustration of the estimator, Eq. (27), applied to the partially accessible two-cycle network is depicted in Fig. 4(d). The mean entropy production σ and the entropy estimator σ are simulated for long, stationary trajectories and different values of a parameter F , which can be interpreted as a driving force applied to the observed link between the states 2 and 3. An external observer who is able to tune the force parameter F can find a value for which the net station- This set-up and the particular value of F are referred to as stalling conditions and the stalling force, respectively [39,44,45]. Knowing this stalling force through either measurement or calculation amounts to knowing the effective "pressure" the remaining network exerts on the link (23) against the force F . This information is incorporated in the so-called "informed partial" entropy estimator σ IP introduced in [44]. Since the remaining network is taken into account through the effective pressure, σ IP surpasses the estimator obtained by merely measuring the "passive partial" entropy production σ P P that can be attributed to the transitions in an observed subset [43], i.e.
as proven in the context of the "informed partial" estimator in [45]. Under stalling conditions, both estimators σ P P and σ IP become trivial, because they cannot rule out the possibility that the underlying system is at equilibrium if j = 0. The introduced time-resolved estimator σ , however, is able to infer non-equilibrium since σ > 0 even if j = 0, as additional information enters its definition in Eq. (27). Intuitively, the waiting time distributions encode information about the hidden cycle in their timedependence through a non-constant a(t). More quantitatively, the estimator σ defined by Eq. (27) numerically reproduces the bound of the waiting time distribution based estimator proposed in [39] for the network in Fig. 4. Both the estimator in [39] and σ share the features of considering successive transitions and adding a time-resolution through waiting time distributions. However, σ is formulated without the framework of a higherorder semi-Markov process or a Markov chain decimation scheme. While these differences render a general quantitative comparison with our estimator difficult, σ beats the informed partial estimator σ IP for long, stationary trajectories, as we will prove in appendix B 4. Note that the expectation values are still taken in the limit of large observation times in which finite-time effects at the initial and final transition can be neglected. It is also evident from the proof that the equality is achieved in the first relation if and only if a(t) is time-independent. Equality in the second relation is achieved if and only if removing the observed edge results in a network in which detailed balance is satisfied. To give a less formal interpretation of Eq. (31), observational access to the waiting time distributions contains more information than operational access to the observed links via the stalling force F . In particular, it is possible to measure F via without perturbing the system at all, as we will prove in appendix B 4.

V. MULTIPLE OBSERVED LINKS IN A MULTICYCLIC NETWORK
Access to additional observable transitions provides further information about the underlying network, which allows us to infer topology qualitatively by identifying allowed and forbidden sequences of transitions and quantitatively by sharpening our entropy estimator for multicyclic networks.

A. Entropy estimator
For M observed links there are 2M possible transitions and a 2M × 2M matrix of quotients with I, J ∈ {I ∓ , which yields a skewsymmetry a IJ = −a J I . Intuitively, the ratio in Eq. (33) encodes the entropy production term of an effective twostep trajectory Γ t IJ = I → J of length t. This term is related to the path weights of microscopic trajectory snippets γ t I→J = k → l → · · · → o → p of the same length t between two observed transitions I = (kl) and Similar to the unicyclic case in Eq. (5), unobserved degrees of freedom in the microscopic path γ t I→J are integrated out by the summation over the path weights. The ratios in Eq. (33) allow us to generalizeσ, defined in Eq. (27), to multiple observed transitions. We define the conditional counters as where we adopt the same notation as in Eq. (26), i.e., the m-th transition I m is located at T m−1 . The sum over all a IJ (t) in a trajectory constitutes the entropy estimator which reduces to Eq. (27) in the case of a single link, i.e., two possible transitions I ± = ±. Thus, registering a jump J after a previous jump I during an observation of a long trajectory increasesσ by a IJ (t), an antisymmetric increment in which inaccessible data beyond the registered observable one is integrated out. The entropy estimator is thermodynamically consistent in the sense of Eq. (28) and satisfies the fluctuation theorem from Eq. (29) in the long-time limit, T → ∞. Moreover, the definition (36) provides the fluctuating counterpart of the entropy estimator for multicyclic networks introduced in Ref. [61], which is given by σ in our notation.

B. Network topology
When we consider multiple transitions, their relative position in the network has crucial impact on the observed data. For a given network, the waiting time distribution ψ I→J (t) does not only depend on the pair of transitions I, J, but the entire set of observed links. For example, in the effective description of the network in Fig. 4(a), a (23)(23) (t) is time-dependent but becomes time-independent if in addition the transitions (13) and (31) are observed. The reason is that the fluctuationtheorem-like argument for the affinity can be restored, since observing ψ (23)→(23) (t) necessarily implies completion of the cycle C = (23412). Formulated differently, we can retrace the arguments underlying Eq. (11) to deduce an equality because the only possible completed cycle is C. Based on this observation, we can conclude in more general terms that increasing the number of observed links in a network decreases the possible pathways in the remaining, hidden part of the underlying Markov network. This subnetwork, which is obtained by removing all observed links from the Markov network, will be denoted as hidden subnetwork. While the hidden subnetwork is made up of the same states as the Markov network, it contains fewer links and therefore may be disconnected. We can make a few technical but far-reaching observations, which are here formulated for long, stationary trajectories, i.e., expectation values are taken in the NESS and in the limit T → ∞, as before. Let I = (kl) and J = (op) be two arbitrary observed transitions in the network.
1. If the hidden subnetwork is topologically trivial, i.e. does not contain any cycles, then σ = σ . Moreover, all a IJ (t) are time-independent.

2.
A time-dependent a IJ (t) implies the presence of a cycle in the hidden subnetwork. More precisely, if a IJ (t) is non-constant in time, then there is a cycle with non-vanishing affinity in the hidden subnetwork that connects the Markov states l and o. In particular, σ < σ .
3. If J cannot be an immediate successor of I, i.e. if ψ I→J (t) = 0, the Markov states l and o are not connected in the hidden subnetwork. In particular, we can leave out at least one observed transition without decreasing σ .
4. The converse of 2 is not true. It is possible that a IJ (t) is constant in time despite a cycle with nontrivial affinity containing both l and o. However, this behavior is not the generic case but rather requires high symmetry. An explicit example containing such an invisible cycle is provided in appendix E 5.
These four results are based on the microscopic origin of a IJ (t) as a ratio of path weights as indicated in Eq. (34). The crucial argument is an extension of the reasoning used in the unicyclic case to relate ratios of path weights to the cycle affinity A (cf. Eq. (7)). We consider two consecutive transitions I = (kl), J = (op) and two arbitrary paths γ 1 and γ 2 starting and ending in the Markov states l and o, respectively. Their path weights satisfy where A 12 is the affinity of the closed loop obtained by appendingγ 2 to γ 1 . If the hidden subnetwork does not contain any cycles, A 12 = 0 follows trivially. Since γ 1 and γ 2 are arbitrary, Eq. (38) implies the existence of a specific number a IJ satisfying for paths γ t I→J of arbitrary length t with time-reverse γ t J→ I . By summing the previous equation over all possible trajectories of the form γ t I→J , we conclude In particular, a IJ (t) is time-independent if the hidden subnetwork does not contain any cycles or if it satisfies detailed balance, i.e., any cycles in the hidden subnetwork have vanishing affinity. This argument establishes rule 1. To emphasize the relation to our previous results, we note that Eq. (40) can be seen as a special case of the affinity bounds from (11), which collapse to equalities if the set of possible A C contains only one element. If the hidden subnetwork is a spanning tree, the diagonal element a II = A C is the affinity of the cycle C in the unicyclic network obtained by adding the link I back to the hidden subnetwork. In particular, every cycle passes through at least one observed link and has therefore been registered. Since NESS entropy production stems from cycle currents, it seems plausible to conjecture σ = σ . Up to contributions from the first and last transition of the trajectory, the statement even holds on the level of individual trajectories in the form as will be proven in appendix D. Rule 2 is obtained from Eq. (38) by reversing the argument above. Since a nontrivial time-dependence a IJ (t) is impossible if A 12 vanishes for all γ 1 , γ 2 , there must be at least one cycle with non-vanishing affinity. We will now argue that, despite the counterexample given in appendix E 5, the converse of rule 2 is usually satisfied in a generic set-up. If a IJ (t) is constant in time, it equals its limit a IJ (0) as t → 0. By a timescale separation argument similar to Eq. (23) only the shortest connection between the corresponding Markov states l and o contributes in the short-time limit, whereas longer connections are suppressed and lead to higher-order effects. A hidden cycle containing l and o can be split along these states, giving rise to two topologically distinct pathways γ 1 and γ 2 . Unless both pathways contain the exact same number of states, one class of paths is suppressed by the other in the short-time limit. Thus, the hidden cycle must contain an even number of states to avoid this timescale separation argument. In addition to this purely qualitative argument, generic choices of transition rates generally lead to different first-passage times from l to o depending on the topology of the path, which would also lead to a nontrivial time dependence in a IJ (t).
While the derivation of rule 3 is straightforward from a mathematical point of view, it is of high value operationally as it can be used to infer the connected components of the hidden subnetwork. In addition, this rule describes a scheme to identify the transitions needed to recover the full entropy production. While rule 2 gives a simple criterion when a particular set of observed transitions is insufficient to conclude σ = σ , rule 3 formulates a complementary criterion about transitions which are redundant for the entropy estimate. On the level of the Markov network, restoring the minimal number n of observed links I 1 , ..., I n to connect l and o does not create any cycles in the hidden subnetwork. Since entropy production in the steady state is always due to cycle currents, the entropy production in the hidden subnetwork is not increased by not observing I 1 , ..., I n , i.e. by adding I 1 , ..., I n to the hidden subnetwork.
The interplay of statement 2 working "bottom-up" and statement 3 coming "top-down" is not limited to assessing the quality of the discussed estimatorσ. It is also an algorithm for inferring topological aspects of the Markov network by identifying underlying spanning trees, connected components, the position of hidden cycles and, lastly, their affinities and lengths by combining these rules with the methods introduced in section IV.

A. Identification of the semi-Markov description
In the transition-based description, each trajectory ζ of the underlying Markov network is mapped to a trajectory Γ that includes only the observable transitions and the waiting times in between, i.e., symbolically Clearly, this mapping from ζ to Γ is well-defined and many-to-one. Adopting a different yet equivalent perspective, this kind of mapping for the underlying trajectory can be seen as a type of milestoning using the space of observable transitions for partitioning. Milestoning is a particular coarse-graining scheme from molecuar dynamics simulations [67] introduced to stochastic thermodynamics in [59,60]. In short, the milestones represent certain events, whose occurrence indicates the crossing of a milestone that updates the coarse-grained state of the system. In practice, this approach results in a semi-Markov description for the coarse-grained system defined on the space of observable transitions. In other words, each observed transition I is identified as a state in the semi-Markov model. The following discussion includes the key concepts of semi-Markov processes in the context of stochastic thermodynamics, see [56,58,68] for details. The equivalence of the transition-based description to a semi-Markov model becomes evident on the level of single trajectories emerging from the mapping in Eq. (42). An effective trajectory Γ containing N + 1 transitions starting and ending with registered transitions I 1 at time T 0 = 0 and I N +1 at time T N = T , respectively, is fully characterized by the sequence for 0 ≤ t < T N . From a mathematical point of view, the sequence in Eq. (43) precisely defines a particular realization of a semi-Markov trajectory [56], in which the {I k } take the role of the states. Compared to a Markov process, in which the system is fully described by specifying the state i, a full semi-Markov description of the system requires knowing the state I and the waiting time t that has elapsed since I has been entered.

B. Semi-Markov kernels and embedded Markov chain
Since the theory of semi-Markov processes provides the mathematical framework of the effective description, quantities defined for the latter can be expressed in the language of corresponding semi-Markov processes. The waiting time distribution ψ I→J (t) assigned to each transition I, dubbed as inter-transition time density in Ref. [61], is called the semi-Markov kernel in this framework. A semi-Markov kernel ψ I→J (t) is defined as the joint distribution of waiting time t and transition destination J if the actual state is I with age zero, which coincides precisely with the definition of the waiting time distributions in Eq. (3). Integrating out the waiting time t of a semi-Markov kernel results in conditional probabilities for a transition between two semi-Markov states irrespective of the waiting time in I. These probabilities, whose ratios have already been used in Eq. (9), can now be placed in a mathematical context. Based on the transition probabilities p IJ defined by Eq. (44), the concept of the embedded Markov chain (EMC) can be established for every semi-Markov process by integrating out its time variable [56]. The embedded Markov chain of the effective trajectory in Eq. (43) is given by the sequence of observed transitions. The transition probabilities of the corresponding discrete-time Markov process are given by Eq. (44).

C. Path weight and time-reversal operation
According to the semi-Markov description, the path weight P[Γ|I 1 , 0] of the effective trajectory Γ(t) conditioned on the first transition is simply given by where we follow the conventional definition [56,[68][69][70]. Eq. (46) coincides with the effective path weight defined for trajectories of the transitionbased description in Ref. [61]. Note that the first and last transition do not need to be treated differently [56,58,68,71], since the trajectory starts and ends with a transition by construction. The time-reversal operation for the present semi-Markov process is not given by the conventional timereversal operation for semi-Markov processes. Instead of simply reversing Γ in time, as proposed in [69] and [56], two peculiarities emerging from the time-reversal of the underlying trajectory ζ have to be taken into account.
First, Γ contains observed transitions that are odd under time-reversal similar to momenta and therefore need to be reversed [39,59,72]. Thus, it is natural to define the reversed transition I for a transition I as Second, we observe an effect introduced as kinetic hysteresis in [59]. After registering a transition I = (ij) at time t I it would be misleading to treat I as a compound state and conclude that the underlying system remains in I until the next transition J is observed at t J . At some time t with t I ≤ t ≤ t J the state of the coarse-grained system is described completely by knowing the last transition I and the time t − t I that has passed since then. However, the same point in time on the reversed trajectory is described by knowing that t J − t has passed since the last transition J. Thus, J replaces I as the latest registered transition. Combining both effects allows us to formulate the time-reversal of a semi-Markov kernel ψ I→J (t J − t I ) as resulting in for the conditioned path weight P[ Γ|I N , T ] of the timereversed trajectory Γ. Clearly, the time-reversal in Eq. (48) is identical to the time-reversal proposed in Ref. [61] since the shift of inter-transition times discussed there is precisely the effect of kinetic hysteresis described above. Note that the modifications to the time-reversal operation of the semi-Markov process arise naturally, in accordance with the paradigm that time-reversal does not commute with coarse-graining of the form Eq. (42) in general [59].
In the common conception of semi-Markov processes, the direction-time independence criterion is a necessary condition to ensure time-reversal symmetry in equilibrium [56,69]. Remarkably, the semi-Markov process as introduced here breaks this condition in general. This apparent contradiction is resolved since the derivation of the direction-time independence relies crucially on the conventional time-reversal operation for semi-Markov processes, which does not apply here, as discussed above.

D. Interpretation of the entropy estimators
The entropy estimator σ is established for unicyclic networks in Eq. (10). It is based on the microscopic fluctuation theorem in Eq. (8) valid for the ratio of waiting time distributions. The generalization of σ for multicyclic networks with multiple observed links in Eq. (36), which includes the estimator for a single observed link Eq. (27) as special case, relies on the same fluctuation theorem generalized to the multicyclic case. From the semi-Markov perspective, these fluctuation theorems can be interpreted as the consequence of an actual fluctuation theorem of the semi-Markov process. We define the semi-Markov entropy production rate σ SM as the limit which differs from the known expressions e.g. in [68,71,73] because of the modified time-reversal operation. Comparing Eq. (50) to Eq. (29), we conclude that σ SM in fact equalsσ, which was established as a thermodynamically consistent coarse-grained entropy production term in the previous sections. In hindsight, the fluctuation theorem in Eq. (8) can be derived from Eq. (50) by specifying to semi-Markov trajectories with only a single transition. The underlying Markov description does not enter explicitly anymore, instead it is incorporated implicitly by ensuring that σ SM is the correct physical entropy production. The affinity estimators derived in section IV can also be seen as consequences of Eq. (50), tracing back the entropy production to the level of contributing cycles. From the unifying semi-Markov perspective, we can give three complementary interpretations of the estimator σ . First, the derivation presented in Ref. [61] relies on the information-theoretical identification of the expected entropy production of a stochastic process as a Kullback-Leibler divergence between the path weights of a forward and backward process [36,37]. Second, contributions to the fluctuating quantityσ can be attributed to the completion of cycles in the underlying Markov network, which are partially observed for an external observer. Third,σ = σ SM can be interpreted as the entropy production rate of a semi-Markov process with a particular time-reversal operation. Thermodynamic consistency ofσ is then coupled to the applicability of the time-reversal operation, which has to be established from the underlying network.
By interpretingσ as the entropy production σ SM of the equivalent semi-Markov process, the decomposition proposed in Ref. [61] can be identified as a decomposition of σ SM into the entropy production σ EMC of the EMC and the remaining entropy production σ WTD caused by the waiting times, Up to a time conversion factor, σ EMC is the mean entropy production of the EMC, which is given by where p s I is the steady state of the EMC as a discretetime Markov chain. The factor t , the average waiting time between two transitions, is needed because entropy production of a discrete-time Markov chain is naturally measured per step rather than per time. In terms of the application to observed links, p s I quantifies the relative frequency of a particular transition I in a long sequence of observed transitions as given by Eq. (45). Equivalently, Eq. (52) can be derived as the mean of The condition of vanishing σ EMC can also be related to the stalling conditions. In fact, the entropy production associated with the embedded Markov chain coincides with the informed partial entropy estimator σ IP formulated for the case of one accessible transition [44,45], i.e., as proven in appendix B 4. In particular, the force F can be determined as by virtue of Eq. (32) without referring to waiting times at all. This result is not surprising since both estimators measure the affinity A C of a single, averaged "effective cycle" either through the applied force F or through the ratio ln P (+|+)/P (−|−). Without the time-resolution, the estimator σ loses the ability to distinguish between longer or shorter hidden cycles. Thus, we can reformulate a conjecture proposed in Ref. [61] that states that σ EMC exceeds an analogous expression based on the TUR, σ T U R , since σ EMC ≥ σ T U R is equivalent to σ IP ≥ σ T U R . As another consequence of Eq. (54), the fluctuation theorem proven in [45] for σ IP , the fluctuating counterpart of the estimator σ IP , is related to its counterpart for the EMC, Eq. (52). The second expression in Eq. (51), σ WTD , can be deduced by transferring the splitting of the entropy production into contributions from the EMC and remaining contributions from the waiting times to individual semi-Markov kernels in the path weights. In more practical terms, a single semi-Markov kernel ψ I→J (t) can be decomposed into separating the contribution from the EMC from a conditional waiting time kernel ψ(t|IJ) = ψ I→J (t)/p IJ . By decomposing all kernels in the path weights using Eq. (56), we can identify σ WTD as a Kullback-Leibler divergence between the normalized probability densities ψ(t|IJ) and their reverse, ψ(t|JĨ). Thus, the derivation in Ref. [61] relates to factorizing out the EMC according to Eq. (56) in the context of semi-Markov processes. Using Eq. (40), we see that σ WTD vanishes if and only if all a IJ (t) are constant in time. In particular, all a IJ (t) are constant in time if detailed balance is satisfied in the hidden subnetwork. The decomposition of the semi-Markov entropy production in Eq. (51) clarifies additionally the relation between the estimator σ and the entropy estimator σ KLD introduced in [39], which is also decomposed in the form Similar to Eq. (51), this decomposition into contributions from waiting time distributions and affinities is obtained by splitting off the EMC. The analogy is further strengthened by noting that with the first equality proven in [39]. Note that the respective embedded Markov chains are different objects, as σ aff refers to a coarse-grained unicyclic three-state model, whereas σ EMC only observes a single transition of this model. Nevertheless, the result is not entirely surprising in hindsight, since σ EMC recovers the full entropy production of a unicyclic model by virtue of Eq. (9). The difference between the estimators σ WTD and σ WTD , or σ and σ KLD , respectively, emerges from different rationales underlying the respective semi-Markov processes. Describing a physical system with a semi-Markov process is not sufficient to determine its entropy production uniquely, since the correct time-reversal operation needs to be discussed separately [39,65,66]. In total, three different time-reversal operations for semi-Markov processes are implicitly used to define entropy estimators for partially accessible Markov networks.
2. Modified time-reversal, introduced above: This operation includes the kinetic hysteresis effect introduced [59], which is natural for coarse-graining based on milestoning [60]. In our case, semi-Markov states model transitions, which are odd under time-reversal. Any of these operations can be used to define an entropy via Eq. (50). This entropy can always be split according to Eq. (56), where the resulting waiting time contributions are given by 0, σ WTD and σ WTD , respectively. In addition, any of the discussed operations are involutions, each giving rise to a dual dynamics for which an appropriate fluctuation theorem holds for the corresponding entropy production [3]. At this level, any nonvanishing entropy production quantifies a different mathematical notion of irreversibility, which becomes a thermodynamic quantity only if the time-reversal is known to be justified physically [59].

A. Summary and Discussion
In this paper, we have introduced an effective description for partially accessible Markov networks based on the observation of transitions along individual links and waiting times between successive observed transitions. The corresponding waiting time distributions yield an entropy estimator σ . The corresponding fluctuating counterpartσ additionally obeys a fluctuation theorem and was shown to have a natural interpretation as a semi-Markov entropy production. On a microscopic level, we have discussed with cycle fluctuation theorem arguments why observing one link suffices to recover the full entropy production in a unicyclic network. More generally, we have derived an operational criterion that indicates the absence of hidden cycles, which guarantees σ = σ .
If the hidden part of the network contains hidden cycles, we have shown that the estimator σ yields a lower bound on the entropy production, which has been shown to improve on known estimation methods. Additionally, we have shown that the waiting time distributions contain information about topology and cycle affinities of the hidden network. To extract this information, we have derived exact results and estimation methods, whose quality has been assessed numerically. Both the entropy estimator and the affinity estimators are built upon the generalized microscopic cycle fluctuation theorem argument which is, as we have shown, the signature of a fluctuation theorem valid for an effective semi-Markov process. From the perspective of this semi-Markov process, we have unified extant entropy estimators by providing a mathematical interpretation.
Different inference methods can be compared based on the required input data and the significance of their predictions. In the case of a single link, σ relies on the measurement of statistical data contributing to a single current. While the amount of input data is comparable to methods based on the TUR, the predictions generally are much stronger, at least in the unicyclic case. While the TUR provides lower bounds on entropy production and cycle affinity in this case [23], we recover exact values for both quantities even without access to the waiting times. When the waiting time distributions are available, exact cycle lengths can be deduced, which improves significantly on a known TUR-based trade-off relation between affinity and cycle length [32,33].
In terms of predictive significance, the entropy estimator is comparable to the method introduced in [39] that is based on knowing a coarse-grained subnetwork, but it requires substantially less information. Calculatingσ is possible without any knowledge about the underlying network beyond a single observed link. In particular, the issue of decimation schemes for coarse-graining is circumvented completely. Rather, the entropy estimatorσ combines current measurements with information-theoretical notions via conditional counting, since our expectation on the next transition depends explicitly on the previous one [36]. Thus, the sequence of transitions forms a Markov chain, which is identified as the EMC in the corresponding semi-Markov description. A mathematical discussion of semi-Markov processes allows us to clarify physically distinct categories of semi-Markov descriptions depending on the correct underlying time-reversal operation. Although different entropy-like quantities satisfy fluctuation theorems and provide a mathematical notion of irreversibility, the thermodynamically consistent entropy production must be identified by more fundamental means. If measuring the entropy production is feasible operationally, this knowledge can be used to decide which time-reversal operation recovers the correct entropy production. In this sense, identifying the correct time-reversal operation is a task of thermodynamic inference.

B. Perspectives
The transition-based effective description for partially accessible Markov networks and the derived estimators for entropy and topology open a wide range of possible subsequent research topics. First of all, it will be promising to generalize the estimators for affinity and cycle length to networks with multiple observable links. Based on such a generalization, it would become possible to apply the estimators to a broader range of networks. The combined observation of different links would make it additionally possible to infer more information about the network because different affinities and cycle lengths would be accessible.
With the macroscopic limit of large, complex systems in mind, it is an obvious, albeit ambitious, challenge to transfer thermodynamic inference methods to Markov networks whose cycles outnumber the observed links by far. Conceptually, the ratio of waiting time distributions separates the time-resolved notion of irreversibility from other time-dependent effects entering a waiting time distribution. The estimation techniques for topology and affinity that are based on the short-time limit and hence short pathways infer local properties of Markov networks that may even be large. Passing from local to global methods would require a different approach. The dominant parts of the large-scale network structure might become manifest in patterns of particular transition sequences or waiting times in long trajectories. Splitting these into smaller snippets as proposed here is a first step towards a future study of self-correlations in a long trajectory to extract more complex structures.
To gain more insight into the effective description from the established perspective of coarse-graining, one should investigate how existing coarse-graining strategies for observable states [43,44,[48][49][50][51][52][53][54][55]57] are related to the approach introduced here. By combining these complementary approaches and by taking into account conclusions on milestoning [59,60], the concept of coarse-graining can potentially be generalized to a more fundamental level. From a practical perspective, we may ask how the method can be generalized to less ideal situations, e.g., if the observer cannot distinguish between different transitions or registers particular patterns or sequences of transitions only. This class of situations also includes the complementary problem when particular states can be observed rather than particular transitions, because observing the arrival in a state is equivalent to observing all transitions into this state without the ability to distinguish between them.
The potential of waiting time distributions and their role for inference schemes is certainly not exhausted by the results presented here. Combining the estimators for entropy production and network topology with existing numerical methods may increase the usefulness of the waiting time distributions in thermodynamic inference schemes. Fitting rates of the underlying Markov network to the recorded waiting time distribution [42] or using minimization methods [40,41] are promising tools to obtain tighter, more specialized bounds for the discussed estimators or even to reconstruct the transition rates in a small network from sufficient data. These methods will gain particular practical relevance since topological aspects of the underlying network can be deduced rather than have to be assumed.
Furthermore, even though the effective description has been introduced and discussed for observable transitions of a partially accessible Markov network in the NESS, it is, in principle, not limited to this setting. For example, the description could be applied beyond the steady state to analyze transient dynamics. Finally, it would be interesting to apply the approach to a Langevin dynamics to explore the adjustments needed for systems with continuous degrees of freedom.
Appendix A: Waiting time distributions from path weights and trajectory snippets

Markovian path weights and master equation
We consider the effective description of a given, only partially accessible system in which transitions are observed, e.g., the effective two-cycle network from the main text based on the observation of transitions between state 2 and state 3, shown in Fig. A1. We assume that there is an underlying, more fundamental network to which a discrete Markovian description from the perspective of stochastic thermodynamics, as described in detail in [3], can be applied. For the effective description in Fig. A1(a), this full Markov network with two fundamental cycles is shown in Fig. A1(b).
Transitions from state k to state l are governed by a transition rate k kl , which is independent of the time already spent in the state k due to the Markov property of the description. Thus, the waiting time distribution in a particular state must be memoryless and therefore exponentially distributed. In formulae, the probability density for surviving in state k until exactly time t is given by Γ k exp (−Γ k t), where Γ k = l k kl denotes the escape rate of state k. Given that state k is exited, a transition to state l is weighted with the transition rate and therefore happens with the transition probability k kl / l k kl .
Based on the discussed survival and transition probabilities, a path weight quantifying the probability of a trajectory ζ of the Markov network can be introduced. We assume that the network has N states, is fully connected and that there are no unidirectional links, i.e., k kl > 0 implies k lk > 0. The path weight P[ζ(t)] for a generic trajectory ζ(t) conditioned on the initial state k 0 at time t = 0 is given by where the second product runs over all possible transitions (kl) in the network. The trajectory-dependent quantities τ k and n kl denote the total time spent in state k and the total number of transitions (kl) in ζ(t), respectively. In principle, a trajectory-dependent observable can be obtained by a path integral over all trajectories ζ, which in practice means summing over the number L of possible jumps and integrating over all transition times t 1 , ..., t L .
An important consequence is that the probability to observe L jumps in a short trajectory ζ of length ∆t scales as P (L jumps) ∼ ∆t L for ∆t → 0, since because the path weight as given in Eq. (A1) is of order 1 in ∆t. Thus, a first-order differential equation governing the time-evolution of ζ(t) can be derived by calculating the path weights for constant and one-jump trajectories, which are the only terms containing terms of first order in ∆t. The resulting differential equation, is known as the master equation and can be solved to obtain p k (t), the probability to find the system in state k at time t. Since the master equation description Eq. (A3) is equivalent to the path weight description, solving the initial value problem for p k (0) = δ k0k amounts to calculating The symbolic notation of a sum over paths will be used repeatedly in the following calculations.

From fully accessible networks to partially accessible networks
On a coarse-grained level of description, the trajectories of the network are only partially accessible. Thus, a complete analytical description by solving the master equation Eq. (A3) is generally impossible, because even the underlying fundamental network may be unknown.
In the following, we assume that transitions along a single link connecting the Markov states k and l can be observed, but not the states themselves. This transition-based description coincides with the description proposed in [61]. Adopting the notation from the main text, a transition along this link k → l and its reverse l → k are abbreviated as I + = (kl) and I − = (lk), respectively. Since the sequence of observed jumps and the waiting times in between are the only accessible information about the system in our effective description, a typical example of an observed effective trajectory Γ may look like Γ = ? → I + → I + → I − → I + → · · · at jump times ?, T 0 , T 1 , T 2 , T 3 , ..., where ? represents the unknown transition of the system in the past prior to the first observed transition. For simplicity, we assume from now on that the process starts and ends immediately after the observation of an observable transition, I 1 at time T 0 = 0, I N +1 at time T N = T , to address the core of our argumentation without worrying about non-time-extensive initial and final terms of the trajectory. Moreover, the scheme indicated in Eq. (A6) can be generalized to any number of observable links. We write I n = (k n l n ) as the n-th observed transition between the underlying states k n and l n , where we note that l n = k n+1 in general, as hidden dynamics cannot be excluded. Schematically, a coarse-grained trajectory Γ takes the form Similar to the Markov case, the probability for Γ can in principle be quantified by a path weight description. First of all, it is important to note that no memory effects ranging over multiple observed transitions need to be considered. The path weight for the future of the trajectory, i.e., the path weight for the trajectory after a transition I n is registered, is unaffected by the previous block I n−1 → I n since knowing the transition I n = (k n l n ) at t n implies knowing the state of the underlying markovian system immediately after T n−1 . Thus, the path weight can be split into parts belonging to observed transitions, P[Γ(t)|I 1 , 0] = P (I 2 , T 1 |I 1 , 0)P (I 3 , T 2 |I 2 , T 1 ) · · · P (I N +1 , T N |I N , T N −1 ), with P (J, T J |I, T I ) denoting the probability for observing transition J at time T J if transition I was observed at time T I . Constituting the elementary building blocks in the coarse-grained picture, the objects P (J, T J |I, T I ) quantify the probability for observing J after a given I with waiting time T J − T I in between. Thus, Eq. (A8) can also be written as

Effective absorbing dynamics
On a fundamental level, we are interested in how the path weights of the effective description Eq. (A9) and their elementary building blocks Eq. (A10) are linked to the path weights Eq. (A1) of the corresponding microscopic trajectories of the full network. As a first step, we note that the way in which the effective trajectory Γ was split carries over to a splitting on the fundamental level for the microscopic trajectory ζ, because not only the coarsegrained but the entire microscopic state is known at the observed transition events. Symbolically, this can be denoted as where γ t I→J is the snippet of the full trajectory between two subsequent observable transitions I and J with waiting time t in between. This snippet starts in the destination state of I and ends immediately after the transition event J in the corresponding destination state. Since a given snippet is completed immediately after an observed transition J is registered for the first time, each trajectory snippet can be interpreted as a trajectory of an effective Markovian absorbing dynamics defined on the full network obtained by removing all observed links. As soon as the original trajectory ζ completes an observed transition, the absorbing dynamics for γ are terminated immediately. The corresponding first passage time is precisely the length of γ in time and corresponds to the waiting time t between two transitions in the effective description.
Practically, the effective absorbing Markov network is obtained from the corresponding original network by treating all observable links as absorbing, i.e., redirecting the observed transitions into absorbing states. An example for such an effective absorbing Markov network is shown in Fig. A1(c), the corresponding absorbing network for the effective description of the two-cycle network in Fig. A1(a). The possible transitions along the observed link are represented by the states (32) and (23), which are absorbing states in the associated first passage problem. If the considered snippet begins with (23) or (32), the corresponding absorbing dynamics starts in 3 or 2, respectively.
The effective trajectory Γ originates from a mapping of microscopic trajectories ζ → Γ[ζ] to the effective description of the system. The path weight of Γ is obtained by summing over microscopic path weights where P[ζ|j 1 , 0] is conditioned on l 1 at time t = 0 for I 1 = (k 1 l 1 ). While integrating out the Markov path weight P[ζ(t)] directly to obtain the coarse-grained path weight P[Γ(t)|I 1 , 0] is not feasible in general, the decomposition of Γ in Eq. (A9) and of ζ in Eq. (A13) reduces the problem to the level of elementary building blocks ψ I→J (t) and γ, respectively. Thus, the decomposition Eq. (A9) can be combined with the summation in Eq. (A14) to obtain The path weights P[γ|l n , 0] of individual snippets γ = γ t I→J are seamlessly conditioned on the final state of their predecessor, since I n = (k n l n ). The only type of summation that needs to be performed is the calculation of the waiting time distributions ψ I→J (t) conditioned on I = (kl) as introduced in Eq. (A10) by integrating all possible γ = γ t I→J , This equation identifies the waiting time distributions of the effective description as summations over not observable trajectory snippets and proves therefore Eq. (4) of the main text. For I = (kl) and J = (mn), γ t I→J starts at l and ends with a jump (mn) exactly at time t. Since the system is in m immediately before the jump at t, we can use the Markov property to calculate ψ I→J (t) = P (jump (mn) at time t|l, 0) (A18) = P (jump (mn) at time t|m, t)P (m, t|l, 0) (A19) = k mn p m (t), where Eq. (A4) was used for the last equality. The result in Eq. (A20) makes it possible to calculate waiting time distributions analytically by solving the master equation of the effective absorbing dynamics defined on the hidden subnetwork. Note that this procedure is in principle equivalent to calculating the first passage time distributions for the associated first passage problem with the method introduced in [75]. Conceptually, the reasoning used to derive Eq. (A17) and therefore Eq. (A20) is identical to the reasoning used in [61] to derive the inter-transition time densities. For both derivations, the partially accessible Markov network considered in the transition-based description is mapped to an effective first-passage time problem and the waiting time distributions are identified as the corresponding first passage time distributions. In our derivation, this mapping is motivated from an effective splitting emerging on the level of single trajectories, whereas in the derivation in [61], the mapping is deduced mathematically.
Operationally, the proposed calculation method for waiting time distributions differs from the method proposed in [61]. Instead of carrying out the summation in Eq. (A20) explicitly, the waiting time distributions can be calculated from the solution of the effective absorbing master equation for different initial configurations using Eq. (A17). In addition, our calculation method is effective since collecting histogram data from a Gillespie simulation [74] is unnecessary to reconstruct the waiting time distributions, as they can be calculated directly.
To give an explicit example, the proposed method is used to calculate the waiting time distributions for the effective description of the two-cycle network in Fig. A1(a). Solving the corresponding effective absorbing master equation for fixed, randomly drawn transition rates results in four different waiting time distributions, one of them is shown in Fig. A1(d). Additionally, the figure shows how this waiting time distribution based on Eq. (A20) coincides with the corresponding waiting time distribution calculated from histogram data simulated with a Gillespie algorithm of the full network for long trajectories.
Appendix B: Entropy estimator 1. Coarse-grained and full entropy production Our effective description loses information about irreversibility and entropy production. From an abstract point of view, a well-defined many-to-one mapping of trajectories ζ → Γ[ζ] of length T suffices to bound the mean coarsegrained entropy production rate σ against the physical entropy production rate σ provided that Γ → Γ is the correct, physical time-reversal operation. Technically, the bound relies on the log-sum inequality, a standard tool in information theory [76] stating for a i ≥ 0, b i ≥ 0. We apply this inequality in the form [27,77] T σ = . In other words, we have to first time-reverse ζ, which is then followed by the coarse-graining operation, as discussed in [59].

Time-reversal and conditional counting entropy estimator
The previous section identifies the correct time-reversal operation Γ → Γ as the coarse-graining applied to the microscopic time reverseζ(t) = ζ(T − t). An effective trajectory Γ consists of a series of transitions I n = (k n l n ) at times T n , which is schematically denoted Compared to Eq. (A7), the jumping times T i are replaced by the waiting times t i = T i − T i−1 with T 0 = 0. Reversing the corresponding microscopic trajectory ζ in accordance with the previous discussion gives a well-defined effective trajectory of the form where we introduced the reversal operation on individual transitions I n ≡ (l n k n ) for I n = (k n l n ). The reverse transition happens along the same link and is therefore also observable in the effective description by construction. The path weight for the backward trajectory Eq. (B6) can be decomposed into a product of single waiting time distribution objects as in Eq. (A9), After the proper time-reverse Γ is identified, the entropy production of a particular trajectory Γ can be calculated explicitly as where the conditional counters ν J|I (t) are introduced as In the limit T → ∞, contributions from the initial and final state can be neglected, which yields the fluctuation theoremσ and an explicit formula for the expected coarse-grained entropy production rate

Expectation values and entropy production for semi-Markov processes
We will calculate the expectation value ν J|I (t) in Eq. (B14) using appropriate technique known for semi-Markov processes. Note that the transitions I, J, ... are the "states" of the semi-Markov process and that the waiting time t "in state I" is interpreted as the elapsed time since transition I. As defined in the main text, the conditional counter where t is defined as the average waiting time between two semi-Markov transitions. The identification of the stationary distribution p s I is based on elementary results for discrete-time Markov chains, as the number of visits of a particular state I in a long trajectory (I 1 I 2 ...I N ) divided by N tends towards p s I as N → ∞. Note that, although this distribution is related to the stationary distribution of the semi Markov process itself, they are different even in the Markovian case [56].
Since the ψ I→J (t) are normalized by virtue of Eq. (A12), we can integrate over t to obtain the expected flux n J|I from a semi-Markov state I to J as The semi-Markov entropy production σ SM is defined by Eq. (50) as the probability ratio of forward and backward trajectory under the time-reversal operation Γ →Γ. Thus, the calculations of the previous section B 2 starting from Eq. (B11) actually apply to the semi-Markov entropy production σ SM =σ. Substituting Eq. (B17) into Eq. (B14), we obtain for the semi-Markov entropy production. To put this expression into relation with the entropy production of the embedded Markov chain (eMC), we apply the log-sum inequality after using Eq. (B17) to obtain in accordance with Eq. (52).

Comparison to informed partial entropy production
a. Entropy estimators: embedded Markov chain vs. informed partial In this section, we prove that in the one-link case, which implies σ IP ≤ σ by virtue of Eq. (B21). We will prove the case of one observable link between the Markov states k and l, since the crucial relation Eq. (B27) and its proof can be generalized to multiple observed links following an analogous approach. For two states + = (kl) and − = (lk), Eq. (B21) simplifies to where Eq. (B17) and Eq. (A12) have been used for the second equality. It can be verified that for any sequence of transitions, i.e. any trajectory, n +|+ − n −|− and n + − n − differ by at most 1 due to terminal effects of the first and last transition that become negligible in the long-time limit. Thus, n +|+ − n −|− = n + − n − = j can be used to express σ EM C as This expression is now in a form where it can be compared with the informed partial entropy production [44,45] σ IP = j ln at the same link (connecting the Markov states k and l). The stalling distribution p st is defined as the stationary state of the hidden subnetwork, where the link k → l is removed entirely. Thus, if the original process is generated by the stalling distribution satisfies L st p st = 0, with a modified generator L st that is obtained from L by setting k kl = k lk = 0. The same stalling distribution p st can be accessed operationally if k kl and k lk depend on an external parameter F via k kl (F )/k lk (F ) = exp(F )k kl (0)/k lk (0). For F = 0, the rates match their original value, k kl (0) = k kl and k lk (0) = k lk , respectively. Then we can find the stalling distribution by sweeping over all values of F until we find the value F st where the stationary current j = p st k k kl (F st ) − p st l k lk (F st ) vanishes. In a unicyclic process, this information is sufficient to infer the cycle affinity A [45], thus both σ IP and σ recover the full entropy production σ = jA in this case. The relationship of the stalling distribution and the corresponding waiting time quotient can be rooted on a more general level. As we will prove, holds true in a general Markov network, which suffices to establish σ IP = σ EM C ≤ σ from Eq. (B24) and Eq. (B25). In particular, where Eq. (B17) was used for the first equality. This result establishes Eq. (32) in the main text.

b. Proof of equation (B27)
To establish Eq. (B27), we consider the absorbing problem from Appendix A with two absorbing states "+" and "−", which indicate an observed forward jump k → l and backward jump l → k, respectively. In the notation of section A, we have "+"= (kl) and "−"= (lk). The absorbing problem can be seen as a discrete-time Markov chain with transition probabilities p i→j given by Here, i, j ∈ {1, ..., N, +, −} can be any one of the N Markov states or the absorbing states ±.
Since the states + and − are absorbing, the Markov chain will end in one of these states almost surely. The probability that the last state is + or − given the current state j will be denoted P (±|j). These probabilities are related to the waiting time distributions via ∞ 0 dt ψ +→+ (t) = P (+|(kl)) = P (+|l) (B30) ∞ 0 dt ψ −→− (t) = P (−|(lk)) = P (−|k) (B31) by virtue of Eq. (A12). To calculate P (+|j) and P (−|j) = 1 − P (+|j), we observe that these probabilities have to satisfy a linear system of equations with analogous equations for P (−|j). It is sufficient to let 1 ≤ j ≤ N , since P (+|+) = P (−|−) = 1 − P (+|−) = 1 − P (−|+) = 1. The system in Eq. (B32) can be recast in matrix form by adjusting the generator matrix L slightly. We define L by setting the (k, l)-th and (l, k)-th element of L to zero, i.e., Note that L is not the generator of a Markov process, as it is not column-stochastic. After substituting (B29) into Eq. (B32) and multiplying both sides with j k ij , we rearrange terms to obtain with the vectors p ± ≡ i P (±|i)e i . Applying Cramer's rule, we can express the inverse matrix L −1 as Here, L (j|i) denotes the matrix obtained by removing the j-th row and the i-th column from L. By multiplying Eq. (B34) and Eq. (B35) with L −1 , we obtain for the k-th row of Eq. (B34) and the n-th row of Eq. (B35), respectively. Dividing these equations, we obtain This equality can be compared to a relation connecting the stalling distribution p st to the generator L st , which is established in Appendix B of [45]. Comparing the matrices L st and L (cf. Eq. (B33)), we observe that they only differ in the (k, k)-th and the (l, l)-th element, since the absorbing states affect the escape rates in state k and l only. In particular, the minors coincide, since L st (l|k) = L (l|k) and similarly for interchanged l ↔ k. By using these identities and the equations (B30), (B31), (B39) and (B40), the claim in Eq. (B27) follows from To extract N 1 from ψ I→J (t), we take the logarithm of Eq. (C2), which results in Thus, the derivative with respect to ln t scales as for small t, which ultimately gives in the limit t → 0. Note that the logarithmic derivative in Eq. (C4) has been written out explicitly in Eq. (C5) to increase readability. Based on Eq. (C5), the length N 1 of the shortest connection between l and m can be estimated from the short time limit of a given waiting time distribution for transitions I = (kl) and J = (mn).

Length of the second shortest connection
The second shortest connection between I = (kl) and J = (mn) can be estimated similarly by considering the short time limit of the ratio of waiting time distributions. To this end, we calculate the number of links in the second shortest connection between the Markov states l and m from a time scale separation. The logarithmic ratio of waiting time distributions a I→J (t) is defined as where γ t I→J denotes the snippet of the underlying trajectory starting with I and ending with J after time t with corresponding conditioned path weight P[γ t I→J |I]. As defined in the main text and in section B 2, the tilde on a capital letter reverses the corresponding transition, e.g. I = (lk) for I = (kl).
In a first step, we divide the sets of all possible snippets γ t I→J and their time reversed counterparts γ t J→ I into two subsets corresponding to their number of jumps. Let n γ denote the number of hidden transitions between the states l and m in the snippet γ t J→ I . By definition of N 1 , we have n γ ≥ N 1 , which allows us to write Since reversing a path does not change the number of jumps, the sums in the denominator satisfy analogous scaling laws. We will now choose N 2 as the largest number for which is time-independent. Such a number exists if the shortest connection of length N 1 is unique, since in this case N 1 + 1 is a valid choice. The reason is that if only paths containing N 1 transitions between l and m are allowed, only one possible connection remains, i.e. a 0 is time-independent because the network becomes unicyclic effectively. This argument remains valid as long as N 2 is smaller than or equal to the length of the second shortest connection. As soon as n γ is large enough to allow for paths along the second connection, a(t) becomes time-dependent if the resulting hidden cycle has nonvanishing affinity. Thus, we proceed knowing that N 2 as defined in Eq. (C10) is the length of the second shortest connection. Substituting this equation into Eq. (C7), we obtain a I→J (t) = a 0 + ln . (C11) After rearranging terms, we use Eq. (C8) and Eq. (C9) to extract the lowest order correction in the numerator, and similarly for the denominator. After rearranging terms in Eq. (C11), taking the limit t → 0 yields the result for neighboring states i, j in the hidden subnetwork. With this potential function, the stochastic entropy production of a trajectory in the hidden subnetwork becomes path-independent: A trajectory γ starting and ending in states l and k respectively satisfies σ m [γ] = F (l) − F (k). In particular, we can use the fluctuation theorem property Eq. (D4) of σ m to identify Due to the existence of the potential, the result is independent of the particular sequence of transitions in γ tj Ij →Ij+1 . Note that the second term is needed because the observed transitions that terminate a trajectory snippet were not accounted for in σ m , which it is defined in the hidden subnetwork.
Since the hidden subnetwork is topologically trivial, we can evaluate the ratio of path weights in Eq. (D6) using Eq. (39). We obtain a Ij Ij+1 + ln k k1l1 k l N +1 k N +1 . (D8) The first term recovers the value ofσ precisely, because Eq. Appendix E: Model parameters, simulation and counter example 1. Unicyclic network for the illustration of the coarse-graining scheme in Figure 1 The unicyclic network in Fig. 1 of the main text includes four different states with different links in between. The transition rates for the network are shown in table I. The waiting time distributions shown in Fig. 1 (d) of the main text can be derived from the solution of the absorbing master equation for the effective absorbing Markov network defined by Fig. 1 (b) of the main text with the method from section A. For the waiting time distributions shown in Fig. 1 (d) of the main text, the corresponding absorbing master equation for the transition rates in table I has been solved numerically.  The multicyclic network in Fig. 2 of the main text includes seven different states with different links in between. The transition rates for the network leading to the affinities given in the caption of Fig. 2 of the main text are shown in table II. This specific choice of transition rates results in different time scales for the transitions through the individual cycles including the observable link. Thus, no cycle is preferred and the existence of all cycles can be inferred from the successive maxima and minima of the corresponding a(t). a (71)→(71) (t) in Fig. 2 (c) and (d) of the main text is by definition given by the logarithm of the fraction of the corresponding waiting time distributions. As discussed in section A, these waiting time distributions can be derived from the solution of the absorbing master equation for the effective absorbing Markov network defined by Fig. 2 (b) of the main text. For a (71)→(71) (t) shown Fig. 2 (c) and (d) of the main text, the corresponding absorbing master equation for the transition rates in table II has been solved numerically.  The multicyclic network from Fig. 2 of the main text is also used in Fig. 3 of the main text to illustrate the quality of the derived affinity bounds. For each realization of the network, the transition rates are drawn randomly according to the distributions given in table III. All transition rates are distributed uniformly. The different choices for the definition intervals lead on the one hand to a bias in the affinity A C0 towards positive values and increase on the other hand the maximal possible value of A C0 aiming at a broad range of possible affinities for the scatter plots of the quality factors in Fig. 3 (b), (c) and (d) of the main text. As a consequence of this bias, it is more likely to draw high positive affinities rather than high negative ones. Since the bounds for the maximal affinities are tighter for single, high positive affinities, the quality factors Q + in Fig. 3 (c) are centered around higher values in comparison to the quality factors Q − in Fig. 3 (d).
To calculate the quality factors for the two different classes of network configurations shown in Fig. 3 (b), (c) and (d) of the main text, the maximum and minimum of a (71)→(71) (t) are needed for all randomly drawn realization of the multicyclic network. For a given realization, a (71)→(71) (t) can be calculated from the waiting time distributions that can be derived by using the absorbing master equation method from section A. For each realization corresponding to one quality factor shown in Fig. 3 (b), (c) and (d) of the main text, the absorbing master equation for the drawn transition rates has been solved numerically, a (71)→(71) (t) has been determined and the values of the maximum and minimum of a (71)→(71) (t) have been used to calculate the corresponding quality factor according to the definitions given in the caption of Fig. 3 of the main text.  Fig. 3. Θ(a, b) denotes a uniform distribution defined between a and b. 4. Two-cycle network for topology and the entropy estimators in Figure 4 The two-cycle network in Fig. 4 of the main text includes four different states and five links in total. Since the network is comparatively compact yet nontrivial, it was also used as an example network for multiple extant entropy estimators [39,42,44,45]. To compare the introduced entropy estimatorσ to existing results, the rates given in table IV are chosen according to the rates in [39]. The force parameter F applied to the observable link is related to the transition rates of the observed link via − 2F = ln k 23 k 32 − ln 2 3 .

(E1)
This parameter is introduced in [39] to tune the observed net current j = n + − n − . To estimate the cycle lengths as discussed in Fig. 4 (b) and (c) of the main text, the waiting time distributions for the observable transition are needed to calculate ln ψ (32)→(32) (t) and a (32)→(32) (t). As for the networks in Fig. 1 and Fig. 2, these distributions can be derived by using the absorbing master equation method from section A. For the estimations of the cycle lengths shown in Fig. 4 (b) and (c), the corresponding absorbing master equation for the transition rates in table II with F = ln 3 has been solved numerically.
To calculate the introduced entropy estimatorσ for a given trajectory of the two cycle network, a(t) and the conditioned transition counters ν +|+ (t) and ν −|− (t) of the observed link are needed. a(t) can be calculated independently of ν +|+ (t) and ν −|− (t) from the corresponding waiting time distributions derived with the absorbing master equation method from section A. A trajectory resulting from the observation of the two cycle network can be generated by simulating the full underlying Markov network with the Gillespie algorithm [74]. For the entropy estimator in Fig. 3 (c), a(t) has been calculated from the waiting time distributions resulting from the numerical solution of the corresponding absorbing master equation. The conditioned transition counters have been calculated by counting the corresponding transitions in a simulated Gillespie trajectory of length T = 10 7 , weighting the calculated number of transitions with the corresponding a(t) evaluated at the registered waiting time leads to σ . The mean entropy production of the Markov network have been directly calculated in the Gillespie simulation.  Fig. 4. F is a dimensionless force applied to the observable link between state 2 and state 3.

A model with an invisible cycle
In this section, we present an explicit Markov network that has a cycle with nonvanishing affinity in the hidden subnetwork, although all a IJ (t) are time-independent. We use the network of Fig. 4 with the choice of rates given by table V. Instead of observing the transitions between state 2 and 3, we observe I + = (13) and I − = (31).
TABLE V: Transition rates for the counterexample. The network is given by Fig. 4 with the specified configuration of rates.
The ratio a(t) remains unaffected by this procedure. In particular, since the model is now reduced to a unicyclic network with cycle C = (13H1), a(t) is constant in time, taking the value a(t) = A C = ln[2/(a + 1)]. In contrast, the cycle affinity of the true hidden cycle C = (12341) in the hidden subnetwork is given by A C = ln a. For a = 1 this