A generalization of the maximum entropy principle for curved statistical manifolds

The maximum entropy principle (MEP) is one of the most prominent methods to investigate and model complex systems. Despite its popularity, the standard form of the MEP can only generate Boltzmann-Gibbs distributions, which are ill-suited for many scenarios of interest. As a principled approach to extend the reach of the MEP, this paper revisits its foundations in information geometry and shows how the geometry of curved statistical manifolds naturally leads to a generalization of the MEP based on the R\'enyi entropy. By establishing a bridge between non-Euclidean geometry and the MEP, our proposal sets a solid foundation for the numerous applications of the R\'enyi entropy, and enables a range of novel methods for complex systems analysis.


I. INTRODUCTION
The progressive unveiling of the intricate connections that exists between information theory and statistical mechanics has allowed fundamental advances on our understanding of complex systems [1].One of the most important methods resulting from those discoveries is the maximum entropy principle (MEP), which unifies multiple results and procedures under a single heuristic that operationalizes Occam's razor [2,3].From a pragmatic perspective, the MEP can be understood as a modeling framework that is particularly well-suited for building statistical descriptions of a broad class of systems in contexts of incomplete knowledge [4].The high versatility of the MEP has allowed it to find applications in a wide range scenarios, including the analysis of DNA motifs of transcription factor binding sites [5], covariations in protein families and amino acid contact prediction [6,7], diversity of antibody repertoires in the immune system [8,9], coordinated firing patterns of neural populations [10][11][12][13], collective behavior of bird flocks and mice [14][15][16], the abundance and distribution of species in ecological niches [17,18], and patterns of behavior in various complex human endeavours [19,20].
The efficacy of the MEP rests on Shannon's entropy, which acts as as an estimate of "uncertainty" that guides the modeling procedure.Colloquially, the MEP generates the statistical model that is less structured while being consistent with the available knowledge, building on the available knowledge but nothing else.However, the functional form of the Shannon entropy greatly restricts the range outputs that the MEP can offer.In particular, standard applications of the MEP can only generate Boltzmann-Gibbs distributions, which are unsuitable to describe complex systems displaying long-range correlations or other effects related to different types of statistics [21][22][23][24].This important limitation have triggered * pablo_morales@araya.org various efforts to generalize the MEP by means of leveraging generalizations of Shannon's entropy, resulting in a rich array of proposals (see e.g.[25][26][27][28][29]).However, we argue that plugging a generalized entropy into the MEP framework inevitably leads to an adhoc procedure whose value is fundamentally hindered by the heuristic nature of the MEP itself.
An alternative approach to extend the MEP is to consider it not as a stand-alone principle, but as a consequence of deeper mathematical laws.One route to do this -that we follow in this paper -is to regard the MEP as a direct consequence of the geometry of statistical manifolds [30,.In effect, by leveraging the structure of dual orthogonal projections allowed by the flat geometry associated with the Kullback-Leibler divergence [31,32], the seminal work of Amari established how the standard MEP naturally emerges when considering hierarchical "foliations" of the manifold.This perspective not only sets the MEP on a firm mathematical bases, but further endows it with sophisticated tools from information geometry -which can be used e.g. to disentangle the relevance of interactions of different orders within the system [33][34][35].
In this paper we show how the geometry of curved statistical manifolds naturally leads to an extension of the MEP based on the Rényi entropy.In contrast to flat cases, the geometrical structure of curved statistical manifolds disrupts the standard construction of orthogonal projections based on Legendre-dual coordinates, making the analysis of foliations highly non-trivial.Nonetheless, by leveraging the rich literature on curved statistical manifolds [32,[36][37][38][39][40], the framework put forward in this paper reveals how the geometry established by the Rényi divergence is suitable for establishing hierarchical foliations that, in turn, lead to a generalization of the MEP.
The results presented in this paper serve to emphasize the special place that the Rényi entropy has among other generalized entropies -at least from the perspective of the MEP.Furthermore, it provides a solid mathematical foundation for the plethora of existent applications based arXiv:2105.07953v2[cond-mat.stat-mech]29 Mar 2022 on the Rényi entropy (see e.g.Refs.[41][42][43][44]).Furthermore, the novel connection established between information geometry and this generalized MEP opens the door for fertile explorations combining non-Euclidean geometry methods and statistical analyses, which may lead to new insights and techniques to further deepen our understanding of complex systems.
The rest of this article is structured as follows.First, Section II provides a brief introduction to information geometry, emphasising concepts that are key to our proposal.Then, Section III develops the analysis of foliations in curved statistical manifolds, and Section IV establishes its relationship with a maximum Rényi entropy principle.Finally, Section V discusses the implications of our findings and summarizes our main conclusions.

A. The Dual Structure of Statistical Manifolds
Our exposition is focused on statistical manifolds M , whose elements are probability distributions p ξ (x) with x ∈ χ and ξ ∈ R d .The geometry of such statistical manifolds is determined by two structures: a metric tensor g p , and a torsion-free affine connection pair (∇, ∇ * ) that are dual with respect to g p .Intuitively, g p defines norms and angles between tangent vectors and, in turn, establishes curve length and the shortest curves.On the other hand, the affine connection establishes contravariant derivatives of vector fields establishing the notion of parallel transportation between neighbouring tangent spaces, which defines what is a straight curve.
Traditional Riemannian geometry is build on the assumption that the shortest and the straightest curves coincide, which led to the study of metric-compatible (Levi-Civita) connections -pivotal to the development of the theory of general relativity.However, modern approaches motivated in information geometry [45] and gravitational theories [46,47] consider more general cases, where the metric and connections are independent from one another.In such geometries, the parallel transport operator Π : T p M → T q M and its dual Π * [48] (induced by ∇ and ∇ * , respectively) might differ.The departure of ∇ and ∇ * from self-duality can be shown to be proportional to Chentsov's tensor, which allows for a single degree of freedom traditionally denoted by α ∈ R [45].Put simply, α captures the degree of asymmetry between short and straight curves, with α = 0 corresponding to metric-compatible connections where ∇ = ∇ * .
An important property of the geometry of a statistical manifold (M , g, ∇, ∇ * ) is its curvature, which can be of two types: the (Riemann-Christoffel) metric curvature, or the curvature associated to the connection.Both quantities capture the distortion induced by parallel transport over closed curves -the former with respect to the Levi-Civita connection, and the latter with respect to ∇ and ∇ * .In the sequel we use the term "curvature" to refer exclusively to the latter type.

B. Establishing geometric structures via divergences
A convenient way to establish a geometry on a statistical manifold is via divergence maps [49].Divergences are smooth, distance-like mappings for the form D : M × M → R, which satisfy D(p||q) ≥ 0 and vanish only when p = q [50].We use the shorthand notation D[ξ; ξ ] := D(p ξ ||q ξ ) when expressing D under a parametrization of M in terms of coordinates ξ = (ξ 1 , . . ., ξ n ) [30]; divergences in this form are often called "contrast functions" (see Ref. [51,Sec. 11]).
Let us see how one can naturally build a metric from a contrast function [49,Sec. 4].A metric g(ξ) can be built from the second-order expansion of the divergence D as which is positive-definite due to the non-negativity of D. This construction leads to the Fisher's metric, which is the unique metric that emerges from a broad class of divergences [49,Th. 5], with this being this closely related Chentsov's theorem [52][53][54][55].Analogously, connections (or equivalently Christoffel symbols) emerge at the third order expansion of the divergence as follows: where the shorthand notation ∂ ξ i = ∂ i and ∂ ξ i = ∂ i has been adopted for brevity.In summary, Fisher's metric is insensible the choice of divergence but the resulting connections are, and therefore the effects of a particular D manifest only at third-order.Interestingly, this construction relating the metric and connections with the second and third derivatives of a scalar potential bears a striking resemblance to Kähler structures on complex manifolds, which can be built through further constraints and are applicable to a range of inference problems [56,57].The approach of building geometries based on divergences does not lack generality, as it has been shown that any geometry can be expressed by an appropriate divergence [58,59].Of the various types of divergences explored in the literature (c.f.[60] and references within), two classes are particularly important: f -divergences of the form for f (x) convex with f (1) = 0, and Bregman divergences of the form for φ(ξ) a concave function [61], with Dφ = (∂φ/∂ξ 1 , . . ., ∂φ/∂ξ d ) denoting the gradient of φ, ψ(η) = min ξ η • ξ − φ(ξ) is the Fenchel-Legendre concave conjugate of φ, and η the dual coordinates of ξ such that ξ = Dψ(η) and η = Dφ(ξ) .
Each of these types of divergences have important properties from an information geometry perspective: f -divergences are monotonic with respect to coarsegrainings of the domain of events χ, while Bregman divergences enable dual structures that set the basis for orthogonal projections [62].
As mentioned above, the deviation of a given connection ∇ from its corresponding metric-compatible (i.e.Levi-Civita) counterpart can be measured by αT , where T corresponds to the invariant Amari-Chensov tensor [63,64] and α ∈ R is a free parameter.The invariance of T implies that the value of α entirely determines the connection, and the corresponding geometry can be obtained from a divergence of the form which is known as α-divergence.As important particular cases, if α = 0 then D α becomes the square of Hellinger's distance, and if α = 1 then it gives the wellknown Kullback-Leibler It is worth noting that geometrical structures are invariant under certain types of transformations.For example, consider a divergence D given by D[ξ; ξ ] := F (D[ξ; ξ ]), with F a monotone and differentiable function satisfying F (0) = 0 [65].Then, it can be shown using Eqs.( 1) and ( 2) that the metric and connections induced by D and D are related as follows: Therefore, D gives rise to exactly the same geometrical structure when F (0) = 1, and a scaled version otherwise.More general transformations between divergences and their corresponding geometries are discussed in Section II D.

C. A Pythagorean relationship in curved spaces via the Rényi divergence
The connection induced by the KL divergence and its natural coordinates is flat (i.e.Γ ijk (ξ) = Γ * ijk (ξ) = 0).However, this does not hold for α-divergences when α = 1, which retain the same Fisher's metric but induce a connection with constant sectional curvature ω = (1 − α 2 )/4 over the whole manifold [39,Theorem 7].This results into a spherical (S n ) geometry for α ∈ (0, 1), or an hyperbolic (H n ) geometry for α / ∈ (0, 1).A non-zero curvature affects the relationship between geodesics [66]: if the "α-geodesic" joining p and q is orthogonal (with respect to the Fisher metric) to the one joining q and from r, then resulting in a deviation from the standard "Pythagorean relationship" that is observed for the case of α = 1 [31].However, one can rewrite Eq. ( 10) as which describes the relationship between angles on the sphere or hyperbolic space -depending on the sign of ω [31].Interestingly, Eq. ( 11) suggests that a divergence of the form with α = −1 − 2γ would recover the "Pythagorean relationship."In fact, D γ can be recognized as the wellknown Rényi divergence of order γ − 1 [39,45], noting that we follow Ref. [67] in adopting a shifted indexing.The Rényi divergence is an f -divergence with f (x) = x γ but it is not a Bregman divergence; however, one can re-cast it as a "Bregman-like" divergence [39].To see this, let's consider pξ ∈ M to be a deformed exponential family distribution of the form (see Appendix A) where h(x) ∈ R d is a vector of sufficient statistics of x and ϕ γ is a normalising potential given by Note that Eq. ( 14) gives a standard exponential family distribution when γ → 0. By defining D γ [ξ; ξ ] := D γ (p ξ ||p ξ ) to be the corresponding contrast function of the Rényi divergence, then one can show that [39, Th.13] which resembles Eq. ( 5) but with the factor ξ • η replaced by a logarithm.Above, is a generalization of the Fenchel-Legendre transform of ϕ γ , which has conjugate coordinates established by with Dϕ denoting the Euclidean gradient of ϕ.Finally, it is worth noting that where X is a random variable that follows the distribution p ξ (x), h(X) denotes the sufficient statistics of X, and Z ξ (h) is defined implicitly as the quantity within the curly brackets.Hence these generalized Fenchel-Legendre dual coordinates can be alternatively expressed as For the case of γ = 0, Eq. ( 20) reduces to the well-known relationship given by η = E ξ {h(X)}, (see Appendix B for further comments).

D. Conformal-projective classes
Conformal transformations are operations over geometric structures that are angle-preserving, amounting to (pseudo) rotations and dilation of the points in the manifold.Technically, a conformal transformation on M is defined as an invertible map ω : M → M such that the induced metric by the pull-back map ω * : T ω(p) M → T p M is related to the original metric up to a scaling factor λ(p) : M → R such that for all X, Y ∈ T ω(p) M .Correspondingly, two metrics g and g are said to be conformally equivalent if they can be linked via a conformal factor λ as in Eq. ( 21).Due to their non-Riemannian geometry, geometric transformations on statistical manifolds that are "structure-preserving" are not fully specified by their effect on the metric, but also need to characterize its effect on the connections -which may diverge from metricdependence via Chentsov's tensor.This characterization can be done by relaying on the notion of projectively equivalence: two connections ∇ and ∇ are said to be projectively equivalent if there exists a 1-form ν = a i (ξ)dξ i that satisfies with δ j i the Kronecker delta [68].A convenient way to put these notions together and build conformal-projective transformations is by considering transformations over divergences.Two divergences D and D are said to belong to the same conformalprojective class if two conditions are met: (i) their induced metrics are conformally equivalent, and (ii) their induced connections are projectively equivalent.It can be shown that two divergences belong to the same conformally-projective class if and only if they satisfy with λ being the conformal-projective factor [69].
Let us now study the relationship between the geometries induced by D γ , D α , and D KL .By considering the inverse of Eq. ( 12), one finds that the function establishes the diffeomorphism which reveals that the Rényi divergence and αdivergences generate exactly the same geometry (as described by Eqs. ( 9)).Building on this fact, and leveraging the Legendre-like form of the Rényi entropy shown in Eq. ( 16), a direct calculation shows that the action of F on D γ can be expressed as a Bregman divergence D φ scaled by a conformal-projective factor [70, Th. 1]: Above, φ is a scalar potential given by φ(ξ) = e γϕ0(ξ) with ϕ 0 (ξ) as given in Eq. ( 15), and the conformal-projective factor κ has the form Moreover, please note that D φ describes a dually-flat geometry, belonging to the same equivalent class as the KL divergence.Thus, these results together establishes that Rényi's D γ , D α , and D KL belong to the same conformalprojective equivalence class.
To conclude, let us present a derivation of the functional form of κ(ξ) used in Eq. ( 25) following Ref.[70].The metric induced by D γ [ξ; ξ ] is given by and hence gij (ξ) = κ(ξ)g ij (ξ).Furthermore, its induced connection and metric curvature can be found to be Hence, by introducing the 1-form ν = d log κ(ξ), one can identify the affine connection induced by Γ k ij (ξ) as being projectively flat.This 1-form -or equivalently, the conformal factor κ(ξ) -can be derived from the Riemann curvature tensor, which for spaces of constant sectional curvature takes the form R l ijk = K(g jk δ l i − g ik δ l j ), with K ∈ R corresponding to its scalar curvature.As mentioned in Section II C, the geometry induced by the α-divergence has curvature ω = (1 − α 2 )/4 throughout the whole manifold, and hence its Riemann tensor can be rewritten as where a factor 1−α 2 = γ + 1 from ω has been absorbed by the metric [71].Moreover, using the fact that the Riemann tensor is left unchanged by the conformalprojective transformation (i.e.R l ijk = R l ijk ), and recognising that K = −γ, one can use Eqs.( 27), (28b) and (29) to show that for some a i , b ∈ R. Finally, as the linear terms can be absorbed in the definition of φ, Eq. ( 30) leads to the expression for κ(ξ) as shown above.

III. ORTHOGONAL FOLIATIONS IN CURVED STATISTICAL MANIFOLDS
This section presents the study of orthogonal foliations in curved statistical manifolds.For simplicity of the exposition, the rest of the paper focuses on multivariate distributions of n binary random variables -i.e.distributions of the form p(x) where x = (x 1 , . . ., x n ) with x i ∈ {0, 1}, and hence χ = {0, 1} n .

A. Orthogonal foliations on flat-projective spaces
Let us consider a parametrization ν of the manifold M .Then, for a given p ν ∈ M we define which establishes a nested structure on the manifold of the form The parametrization p ν also induces a natural basis for the cotangent space at each p ∈ M , which we denote by ∂ νi ∈ T * p M .To study the geometry of this basis, let's consider the functional form of D γ induced by ν, which is given by D γ [ν; ν ] := D γ (p ν ||p ν ).Then, the inner product between the basis elements ∂ νi can be calculated as The properties of D γ guarantees that gij (ν) is positivedefinite, and hence it has a well-defined inverse for each ν which we denote by r ij (ν) := g −1 (ν) ij .By denoting as θ the primal coordinates with respect to r, one can then define where θ u denote the θ-coordinates of the uniform distribution u.It is direct to verify that Interestingly, Ẽk grows with k while Mk shrinks such that for each k their combined dimensions sum up to n -being enough to account for the dimensionality of M .Furthermore, due to the fact that these complementary dimensions are orthogonal, this implies that their intersection cannot be empty.
We summarize these ideas in the following definition.
Definition 1.For a given parametrization ν of M for which Ẽk exists, then the orthogonal foliation of M associated to p ν is the collection of sets Mk {p ν }, Ẽk .
Please note that the bases of T p M and T * p M determined by the generalized Fenchel-Legendre dual coordinates established by Eqs.(18a) and (18b) are not orthogonal under the inner product related to the scalar potential ϕ and its conjugate if γ > 0, as discussed in Appendix C. Therefore, the standard relationship between geometric duality and Fenchel-Legendre duality that holds for γ = 0 is broken in curved statistical manifolds.Nonetheless, projective-flatness allows for the metric induced by D γ to be expressible in coordinates where the bases are manifestly orthogonal up to a conformalprojective factor, so that ∂ ξi , ∂ η j = κ(θ)δ j i with κ(θ) as defined in Eq. (26).Then, θ and its Fenchel-Legendre conjugate established by Eq. ( 6) define a set of conformalprojective coordinates.
Crucially, orthogonal foliations satisfy a Pythagorean property, as shown by the following lemma.
It is important to note that while building orthogonal coordinates is a relatively simple construction, these don't necessarily generally guarantee a Pythagorean relationship.As a matter of fact, although the equivalence between Rényi's and α-divergences ensures that both divergences induce the same geometry, only Rényi's exhibits a correspondence between orthogonality on the metric and a Pythagorean relationship on the divergence (see Section II C).To illustrate these ideas, let us consider a particular construction where we take Mk as the set of probabilities distributions with fixed expectation values, denoted by η, and come up with its orthogonal complement.From φ as the potential encoding these change of coordinates, we define its conjugate potential ψ = min ξ (ξ • η − φ(ξ)).In this way, the primal coordinates ξ orthogonal to η follow from D(ξ where the first term in the right hand side follows from η i = E ξ {h i (x)}.The primal coordinates ξi , allows to construct an orthogonal complement to Mk , and from (A1) one finds that

B. Higher-order hierarchical decomposition
Using a orthogonal foliation, we now introduce the notion of hierarchical decomposition on curved statistical manifolds.
Above, the minimum under D γ and D α is the same, as both divergences are related by a monotonous function as shows by Eq. ( 12).An useful property of the orthogonal foliation is that it enables a useful characterization of p(k) for k > 0, as shown in the next Lemma.
With these definitions at hand, we can prove the following result.
Theorem 1.For a given p ∈ M , the collection of the γ-projections p(n−1) , . . ., u satisfy Proof.Let's start noting that both p(n−1) and u belong to Ẽn−1 , while both p and p(n−1) belong to Mn−1 due to Lemma 2. Therefore, Lemma 1 implies that The rest of the proof can be done following a similar rationale recursively on D γ (p (n−1) ||u).
To better understand the deformation of the layers induced by γ, it is beneficial to consider the mean-field theory approach presented in Ref. [72].Let's consider a classic Ising model for which two layers suffice to describe the system, and focus in its projection to E 1 .In [72] the m and e projections denote the solution and naive approximations, respectively, which are both orthogonal.Moreover, the α-projection draws the trajectory of solutions in between.In the current picture, however, the submanifolds are deformed in such a way that the αprojection becomes orthogonal with α = ±1, which are left as fixed points (see Figure 2).

IV. GENERALISING THE MAXIMUM ENTROPY PRINCIPLE
A. Rényi's entropy and related quantities Consider a manifold of distributions whose support allows a flat distribution.Then, the α-negentropy of p is defined as with H γ = Λ being the Rényi entropy of the uniform distributions, which corresponds to log |χ| for finite χ or log n in the continuum, and being the well-known Rényi entropy.This definition recovers the standard Shannon entropy and negentropy in the case γ = 0 [73].
Another quantity of interest is the γ-Total Correlation, defined as where X n := (X 1 , . . ., X n ) is a random vector that distributes according to p ξ (X = x) with x = (x 1 , . . ., x n ).This is a generalization of the well-known Total Correlation for Shannon's entropy (also known as Multiinformation [74]), which is a generalization of Shannon's mutual information for the case of 3 or more variables [75].In particular, if n = 2 then the total correlation gives a Rényi's mutual information.

B. A hierarchical decomposition of Rényi's entropy
With a hierarchical decomposition p, p (n−1) , . . ., u at hand, we are now poized to address the problem of entropy decomposition based on the relevance of each order.Lemma 3. Consider a the γ-projections of p ∈ M under an orthogonal foliation { Mk {p}, Ẽk } such that Ẽ0 = {u} with u the uniform distribution.Then, the following holds for l < k: Proof.A direct application of Eq. (41) shows that Then, the desired result follows from re-ordering the terms and using the fact that D γ (q||u) = Λ − H γ (q) for any q ∈ M .
Corollary 1.For any multivariate distribution p then Using this lemma, we can put forward our main result.
Proof.Because p(k) ∈ Mk (see Lemma 2), then thanks to Lemma 1 any r ∈ Mk satisfies Therefore, D γ (r||u) ≥ D γ (p (k) ||u) for all r ∈ Mk , and hence it follows that Above, the first equality is due to the fact that p(k) ∈ Mk , and the second equality uses the fact that D γ (q||u) = Λ − H γ q .To prove Eq. ( 51), one can use Corollary 1 and Theorem 1 to show that The desired result is then a consequence of Lemma 3.
Above, ∆ (k) H γ (p) accounts for the relevance of the kth order interactions.In particular, the first order term accounts for all the non-interactive part: with N γ (X j ) being the marginal negentropy of X j .The remaining terms can be seen to be equal to showing that the TC γ captures all the correlated part of the Rényi negentropy, following the relationship observed in Shannon's case for γ = 0 (as discussed in Ref. [75]).

C. Maximum Rényi entropy distributions over constraints on average observables
Let us now consider a collection of observables h over a system of n binary variables defined as with h i,k being the i-th observable of k-th order, with I k i (j) being an appropriate assignment of indices.Then, one can define the following coordinates: For example, ν i,1 are of the form E{x i } and ν j,2 of the form E{x r x s }.Importantly, given that x 1 , . . ., x n are binary variable then one can check that, once ν i,l for all i and l ≤ k are fixed, this in turn determines all the kth order marginals [76].Crucially, this implies that the parameters ν as a whole determine a unique distribution p ν (x), and hence ν is a valid parametrization of the corresponding statistical manifold [14,35].
Let us now consider the family of sets Mk , as defined in Eq.( 31) associated to this parametrization.According to the previous discussion, Mk {p} is the set of all distributions for x that are compatible with the k-th order marginals.For determining the form of the corresponding k-th order γ-projection, we use the following lemma.
for all i and l ≤ k gives a projection of the form with θ i,l = 0 for all l > k, and a normalization factor given by z γ Proof.Using Theorem 2, it is clear that p(k) θ can be found by solving the extreme values of a Lagrangean of the form where q is a discrete distribution and θ j are Lagrange multipliers.The desired result follows from imposing ∂L/∂q i = 0 and ∂L/∂θ j = 0.
Efficient numerical methods to estimate distributions of the form specified by Eq. ( 60) will be developed in a separate publication.

V. CONCLUSION
This paper shows how the non-Euclidean geometry of curved statistical manifolds naturally leads to a MEP that uses the Rényi entropy, generalising the traditional MEP framework based on Shannon's -which take place on flat manifolds.This generalization of the MEP has three important consequences: • It highlights special geometrical properties of the Rényi entropy, which make it stand apart from other generalized entropies.
• It provides a solid mathematical foundation for the numerous applications of the Rényi entropy and divergence.
• It enables a range of novel methods of analysis for the statistics of complex systems.
Rényi's entropy and divergence represent one of many routes by which the classic information-theoretic definitions can be extended.One fundamental feature of the Rényi divergence -that this work thoroughly exploitsis the correspondence that it establishes between orthogonality with respect to Fisher's metric and a Pythagorean relationship in the divergence (which does not hold in the geometry induced by e.g. the α-divergence).This correspondence is the key property that allows us to build hierarchical foliations, despite the fact that in curved manifolds the link between geometric and Fenchel-Legendre duality is generally broken.It is relevant to highlight that the correspondence between orthogonality and the Pythagorean relationship is not guaranteed by other divergences such as the α-divergence, which makes entropies such as Tsallis' [77] not well suited to extend the MEP -at least from an information geometry perspective [78].Considering that extensions of the Renyi entropy exist (e.g.Ref. [79]), an interesting open question is to determine the range of divergences that satisfy these properties.
These findings are in agreement with recent research that is revealing special features of the Rényi entropy and divergence in the context of statistical inference and learning.In particular, Refs.[80,81] show that the Rényi divergence can provide bounds to the generalization error of supervized learning algorithms.Also, Ref. [82] shows that the Rényi entropy belongs to a class of functionals that are particularly well-suited for inference and estimation.Put together, these findings suggest that the Rényi entropy and divergence might be capable of playing an important role in the development of future data analysis and artificial intelligence frameworks.
This work opens the door to novel data-analyses approaches to study high-order interactions.While commonly neglected, high-order statistics have recently been proven to be instrumental in a wide range of phenomena at the heart of complex systems, including the selforganising capabilities of cellular automata [83], gene-to-gene information flow [84], neural information processing [85], high-order brain functions [86,87], and emergent phenomena [88,89].However, exhaustive modeling of high-order effects requires an exponential number of parameters; for that reason, practical investigations need to rely on heuristic modeling methods (see e.g.[90,91]).In contrast, our framework allow us to do projections while optimising the manifold's curvature in order to best match empirical statistics.Importantly, k-th order projections on curved spaces lead to distributions that capture statistical phenomena of order higher than k without increasing the dimensionality of the parametric family.The development of this line of research is part of our future work.
Another set of promising applications is found in condensed matter systems, where the Rényi entropy is often introduced as a measure of the degree of quantum entanglement.In particular, the Rényi entropy results from an heuristic generalization of the Von Neumann entropy, which has important benefits in being (i) more suitable to numerical simulations [92] and (ii) being easier to measure by experiments [93].In particular, the Rényi entropy has been shown to be sensible to features of quantum systems such as central charge [43], and knowledge of it at all orders encodes the whole entanglement spectrum of a quantum state [94].Moreover, in strongly coupled systems, Rényi entropies have been essential for establishing a connection between quantum entanglement and gravity [95,96].More recently, the Rényi mutual information has been taking a central role in the identification of phase transitions [43,44,97].The mathematical framework established in this work serves as a solid basis for these investigations, and further allows the exploration of novel application of information geometry tools in these scenarios.
It is our hope that this contribution may serve to widen the range of applicability of the MEP, while fostering theoretical and practical investigations related to the properties of curved statistical manifolds.

FIG. 1 .
FIG. 1.An schematic diagram depicting the three classes of geometrical structures that arise from their α-value.The curved (i.e.α = ±1) geometries are characterized by the α-and Rényi's divergence, both of which are conformallyprojectively related to the KL divergence -which in turn generates a flat geometry.

Lemma 4 .
The solution of the optimization problem