Adding Machine Learning within Hamiltonians: Renormalization Group Transformations, Symmetry Breaking and Restoration

We present a physical interpretation of machine learning functions, opening up the possibility to control properties of statistical systems via the inclusion of these functions in Hamiltonians. In particular, we include the predictive function of a neural network, designed for phase classification, as a conjugate variable coupled to an external field within the Hamiltonian of a system. Results in the two-dimensional Ising model evidence that the field can induce an order-disorder phase transition by breaking or restoring the symmetry, in contrast with the field of the conventional order parameter which can only cause explicit symmetry breaking. The critical behaviour is then studied by proposing reweighting that is agnostic to the original Hamiltonian and forming a renormalization group mapping on quantities derived from the neural network. Accurate estimates of the critical fixed point and the operators that govern the divergence of the correlation length are provided. We conclude by discussing how the method provides an essential step towards bridging machine learning and physics.


I. INTRODUCTION
At the heart of our understanding of phase transitions lies a mathematical apparatus called the renormalization group [1][2][3][4][5][6]. Central concepts behind its application are those of scale invariance and universality: the former relates to the observation that at criticality the considered phenomena can be described by a scale-invariant theory, and the latter to the notion that systems seemingly unrelated in their microscopic descriptions have a large-scale behaviour that is governed by an identical set of relevant operators. Computational frameworks of the renormalization group [7,8] have seen resounding success in condensed matter systems [9][10][11][12] and lattice field theories [13][14][15].
Recently, deep learning [16], which pertains to a class of machine learning methods that progressively extract hierarchical structures in data, has impacted certain aspects of computational science. Artificial neural networks, consisting of multiple layers, have been efficiently applied in various research fields, including particle physics and cosmology as well as statistical mechanics. For a recent review see Refs. [17,18]. Notable examples include the use of neural networks to study phase transitions [19][20][21][22][23][24][25] and quantum many-body systems [26], while investigations at the intersection of machine learning and the renormalization group have recently emerged [27][28][29][30][31]. In consequence, an in-depth understanding of the underlying mechanics of machine learning algorithms, as well as a simultaneous advancement of efficient ways to implement them in physical problems, is a crucial step to be undertaken by physicists [32].
In this paper a physical interpretation of machine learning is presented. In particular, we consider the predictive function of a neural network, designed for phase classification, as a conjugate variable coupled to an external field, and introduce it as a term in the Hamiltonian of a system. Given this formulation, we propose reweighting that is agnostic to the original Hamiltonian to explore if the external field generates a richer structure than the one associated with the conventional order parameter, and if it can induce a phase transition by breaking or restoring the system's symmetry. The critical behaviour can then be investigated using histogram-reweighted extrapolations from configurations obtained in one phase and without knowledge of the Hamiltonian.
To study the phase transition, we propose a real-space renormalization group transformation that is formulated on machine learning quantities. In particular, a mapping is established between an original and a rescaled system using the neural network function and its field, overcoming the need to rely on observables related to the original Hamiltonian. We then explore, based on minimally-sized lattices, its capability to locate the critical fixed point and to extract the operators of the renormalization group transformation. The entirety of critical exponents can then be obtained using scaling relations and a complete study of the phase transition can be conducted.
We validate our proposal in the two-dimensional Ising model using quantities derived from the machine learning algorithm. By giving a physical interpetation to the function of a neural network as a Hamiltonian term, we explore the effect of the coupled field on the considered system, demonstrate that it can break or restore the reflection symmetry by inducing a phase transition, and extract with high accuracy the location of the critical inverse temperature and the operators of the renormalization group transformation that govern the divergence of the correlation length.

II. NEURAL NETWORKS AS HAMILTONIAN TERMS
We consider a statistical system, such as the Ising model (see App. A), which is described by a Hamiltonian E. The equilibrium occupation probabilities of the system are of Boltzmann form and are given by: where β is the inverse temperature, σ a state of the system and Z = σ exp[−βE σ ] the partition function.
When the system is in equilibrium the expectation value of an arbitrary observable O is: After a neural network is trained on a system for phase classification (see App. B), the learned neural network function f (·) can be applied to a configuration σ, converting f σ into a statistical mechanical observable with an associated Boltzmann weight [35]. In addition, we consider f σ as equivalent to the conditional probability f σ ≡ P b σ that a configuration belongs in the brokensymmetry phase. Consequently, f σ is an intensive quantity bound between [0, 1] and since it has no dependence on the size of the system we can multiply it with the volume V and recast V f σ as an extensive property.
We are now able to investigate the extensive neural network function V f by introducing it in the Hamiltonian of the system. In particular, we consider V f as a conjugate variable that couples to an external field Y and define a modified Hamiltonian: The expectation value of the neural network function can then be expressed as a derivative of the modified partition function Z Y in terms of the field: Setting the neural network field Y to zero results in the standard definition of Eq. (2). Nevertheless, a derivation of Eq. (4) in terms of the field gives: The quantity χ f is recognized as a susceptibility. It is a measure of the response of the predictive function f to changes in the neural network field Y . Consequently, the opportunity to study the effect of a non-zero field Y = 0 in the statistical system is now available. One way to achieve this is to conduct Monte Carlo sampling using the modified Hamiltonian of Eq. (3) to obtain configurations of this modified system. However, an alternative option that overcomes the need for sampling is the use of histogram reweighting [33,34], where machine learning derived observables can also be reweighted in parameter space [35].

III. SYMMETRY BREAKING AND RESTORATION
Consider a set of N obtained configurations σ i from a system whose explicit form of the Hamiltonian E is not known. These configurations have been drawn from an equilibrium distribution, described by Eq. (1), and can be utilized with reweighting to predict the behaviour of the modified system, when the neural network field Y is set to non zero-values.
To achieve this we define the expectation value for an arbitrary observable O, estimated during a Markov chain Monte Carlo simulation, in the modified system that we aim to sample: wherep are the sampling probabilities of the equilibrium distribution. The probabilities p of the original system, defined in Eq. (1), can be substituted forp σi to obtain: Using Eq. (7) one can calculate the expectation value of an observable for a modified system with a non-zero neural network field Y = 0 by using configurations of the original system sampled at inverse temperature β and with zero field Y = 0. Before investigating the effect of Y = 0 to machine learning devised observables we recall that the function f has emerged by training the neural network on obtained configurations, where no knowledge about the explicit form of the Hamiltonian is introduced. As a result, both Eq. (7) and the function f have no immediate dependence on the Hamiltonian and the opportunity to conduct reweighting that is Hamiltonianagnostic is present.
For the case of the neural network function f , results that have been obtained using Eq. (7) can be seen in Fig. 2. A two-dimensional Ising model of size L = 64 at each dimension is simulated at inverse temperatures β = 0.43 in the symmetric phase, β = 0.440687 in the known inverse critical temperature and β = 0.45 in the broken-symmetry phase. We observe that regardless of the phase that the system is in, positive and negative values of the external field Y drive the system towards the broken-symmetry or the symmetric phase, respectively. To gain further insights, we recall that the neural network function is correlated with the probability P s that a configuration is associated with the symmetric phase through f ≡ P b = 1 − P s . Consequently, the associated field which is coupled to f is anticipated to have the observed behaviour.
To measure the response of the predictive function f to changes in the field we calculate the susceptibility χ f , which is depicted in Fig. 3. We note that χ f has maximum values, evidencing the crossing of a phase transition. The results indicate that the field can induce an order-disorder phase transition in the Ising model by breaking or restoring the symmetry of the system. This is in contrast with the external field associated with the conventional order parameter (the magnetization) that, irrespective of its sign, induces an explicit breaking of the symmetry. Based on universality, the emerging phase transition is assumed to be governed by the same relevant operators and the critical behaviour of the neural network field can now be studied with the renormalization group.

A. Locating the Critical Fixed Point
We consider a configuration of an Ising model that has been obtained at a specific inverse temperature β and has an emerged correlation length ξ. By applying the blocking procedure with the majority rule and a rescaling factor of b = 2 (see App. C), the reduction in the lattice size L = L/b will also induce an analogous reduction in the given correlation length: The correlation length ξ is a quantity that emerges dynamically when the system is approaching the critical point β ≈ β c and is therefore dependent on the value of the inverse temperature ξ(β). The rescaled system has a reduced correlation length ξ and is, consequently, representative of an Ising model at a different inverse temperature, with ξ (β ). At the critical fixed point β = β = β c the correlation length in the thermodynamic limit ξ(β c , L = ∞) diverges, and intensive quantities of the original and the rescaled system become equal.
This opens up the opportunity to use an observable derived from a machine learning algorithm to locate the critical fixed point. In particular, we consider at β = β c , the neural network function f which has been expressed as a statistical mechanical observable and is therefore dependent on the inverse temperature: In Fig. 4, the predictive function f has been drawn for an original and a rescaled system. We recall that, under the assumption that configurations of the rescaled system appear with the Boltzmann probabilities of the original Ising Hamiltonian, observables of the rescaled system can be reweighted as observables of the original [36]. We note that the two lines cross at β f c ≈ 0.44055, yielding a first estimate of the location of the critical inverse temperature. The results are obtained using reweighting on a Monte Carlo dataset which has been simulated near the known inverse critical temperature β c ≈ 0.440687. When the inverse critical temperature isn't known it can be estimated by iterating the same procedure on points of intersection until convergence [36]. Given this knowledge, the operators and the critical fixed point of the renormalization group transformation can then be calculated in a quantitative manner.

B. Extracting Operators of the Renormalization Group
The original and the rescaled systems are located at inverse temperatures β and β , and their neural network functions are, therefore, related through: For the case of the inverse critical temperature, Eq. (10) reduces to Eq. (9). We are now able to form, based on Eq. (10), a renormalization group mapping that associates the two inverse temperatures: The correlation length then diverges in the thermodynamic limit according to relations ξ ∼ |t| −ν and ξ ∼ |t | −ν for an original and a rescaled system, respectively, where t = (β c −β)/β c is the reduced inverse temperature. Dividing the two equations, c.f. Eq. (8), we obtain: The renormalization group mapping is then linearized through a Taylor expansion to leading order in the proximity of the fixed point [37], to obtain: By substituting into Eq. (12) we are able to calculate the correlation length exponent: In Fig. 5 results based on Eq. (11) are depicted using Hamiltonian-dependent reweighting on the inverse temperature [35] . We obtain an estimate of the critical fixed point β c = 0.44063 (21) and the correlation length exponent ν = 1.01 (2).
Since the neural network field Y induces a phase transition in the system it is bound to affect the correlation length. Another critical exponent θ Y can then be defined when the field converges to zero at the critical fixed point: Following an analogous derivation, and formulating a mapping Y = f −1 (f (Y )) for the field, the corresponding critical exponent is then calculated through: Fig. 6 shows the results for the case of the neural network field, where we obtain the value of the critical exponent θ Y = 0.534(3), using Hamiltonian-agnostic reweighting based on Eq. (7).
The phase transition of the Ising model is described in completeness based on two relevant operators ν and θ (see App. A). The exponent θ governs the divergence of the correlation length in terms of the external field h that is coupled to the conventional order parameter. We note that the predictive function f is reminiscent of an effective order parameter (see Fig. 2). We find that the numerical value of the exponent θ Y agrees within statistical errors with θ = 8/15. We hence conclude that Y couples to the same relevant operator as the external magnetic field. The results are summarized in Table I FIG. 6.
Rescaled field Y versus original field Y at β = 0.440687. The dashed lines indicate the statistical uncertainty. and the remaining critical exponents can be calculated through scaling relations (see App. A).
We emphasize that the operators and the critical fixed point have been calculated using observables derived from the neural network implementation and their reweighted extrapolations where no explicit information about the symmetries of the Hamiltonian was introduced.

V. CONCLUSIONS
Machine learning can become physically interpretable by being introduced as a term in the Hamiltonian of a statistical system, allowing an efficient study of a phase transition based on renormalization group arguments. Even under the use of minimally-sized lattices, a highly accurate calculation of the critical fixed point and the operators of the two-dimensional Ising model is conducted. The inclusion of a neural network field within the Hamiltonian enables a certain control over the considered system, documenting a structure that is richer than the field of the conventional order parameter. In particular, the neural network field can drive a system towards both the ordered and disordered phase giving rise to an orderdisorder phase transition, in contrast with the external magnetic field that only causes explicit symmetry breaking.
Numerous research directions can be anticipated. Any machine learning function can, in principle, be instilled within Hamiltonians to drive a system to desired behaviours, making the method generally applicable. For instance, a function from a simple model might be used to study a complicated one [38]. The effect of a neural network field in systems with an absence of an order parameter [19] is now also readily available to explore, a prospect that is further enhanced through the use of Hamiltonian-agnostic reweighting. Most importantly, by using Monte Carlo simulations to sample configurations of modified systems that include machine learning functions, an in-depth understanding of the underlying mechanics can be obtained.
In conclusion, by including machine learning as a term in the Hamiltonian an essential step towards bridging machine learning and physics is established, one that could potentially alter our understanding of machine learning algorithms and their effects on systems.

VI. ACKNOWLEDGEMENTS
The authors received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 813942. The work of GA and BL has been supported in part by the UKRI Science and Technology Facilities Council (STFC) Consolidated Grant ST/P00055X/1. The work of BL is further supported in part by the Royal Society Wolfson Research Merit Award WM170010 and by the Leverhulme Foundation Research Fellowship RF-2020-461\9. Numerical simulations have been performed on the Swansea SUNBIRD system. This system is part of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government. We thank COST Action CA15213 THOR for support.

Appendix A: The Ising Model
We consider the two-dimensional Ising model on a square lattice which is described by the Hamiltonian: where ij is a sum over nearest neighbor interactions, J is the coupling constant which is set to one and h the external magnetic field which is set to zero. The system undergoes a second-order phase transition at the critical inverse temperature β c : The order parameter of the Ising model is the magnetization. We often consider the absolute magnetization, normalized by the volume V = L × L of the system: The fluctuations of the magnetization are equivalent to the magnetic susceptibility, defined as: To measure the distance from the critical point a dimensionless reduced inverse temperature is defined: As the system approaches the critical temperature β ≈ β c , finite size effects dominate and fluctuations, such as the magnetic susceptibility χ, have maximum values which act as phase transition indicators.
The phase transition of the Ising model is described in completeness by two relevant operators that govern the divergence of the correlation length. One is the critical exponent ν which is defined via: and the second one is the critical exponent θ which is given through: where h is the external magnetic field. Eq. (A7) is valid when β = β c and h → 0. Given the two relevant operators ν and θ, the critical exponents that govern the divergence of the specific heat (α), magnetization (β (m) ) and magnetic susceptibility (γ) can then be calculated through scaling relations: where d = 2 is the dimensionality of the system.

Appendix B: Neural Network Architecture and Simulation details
The neural network architecture is comprised of a fullyconnected layer (FC1) with a rectified linear unit (ReLU) non-linear function, defined as k(x) = max(0, x). The result is then passed to a second fully-connected layer (FC2) with 32 units and a ReLU function, and is subsequently forwarded to a third fully-connected (FC3) with 2 units and a softmax function. The training is conducted based on the Adam algorithm with a learning rate of 10 −4 and a batch size of 8. The architecture is implemented with TensorFlow and the Keras library.
Configurations are sampled with Markov chain Monte Carlo simulations using the Wolff algorithm [39], and are chosen to be minimally correlated. The data set is comprised of 1000 configurations per each inverse temperature, where 100 have been chosen to create a validation set. Specifically the training range chosen to sample configurations is β = 0.27, . . . , 0.36 in the symmetric phase and β = 0.52, . . . , 0.61 in the broken-symmetry phase with a step size of 0.01. The architecture has been optimized on the Ising model based on the values of the training and the validation loss.