Learning the Ising Model with Generative Neural Networks

Recent advances in deep learning and neural networks have led to an increased interest in the application of generative models in statistical and condensed matter physics. In particular, restricted Boltzmann machines (RBMs) and variational autoencoders (VAEs) as specific classes of neural networks have been successfully applied in the context of physical feature extraction and representation learning. Despite these successes, however, there is only limited understanding of their representational properties and limitations. To better understand the representational characteristics of both generative neural networks, we study the ability of single RBMs and VAEs to capture physical features of the Ising model at different temperatures. This approach allows us to quantitatively assess learned representations by comparing sample features with corresponding theoretical predictions. Our results suggest that the considered RBMs and convolutional VAEs are able to capture the temperature dependence of magnetization, energy, and spin-spin correlations. The samples generated by RBMs are more evenly distributed across temperature than those of VAEs. We also find that convolutional layers in VAEs are important to model spin correlations.


I. INTRODUCTION
After the successful application of deep learning and neural networks in speech and pattern recognition [1,2], there is an increased interest in applying generative models in statistical and condensed matter physics.For example, restricted Boltzmann machines (RBMs) [3][4][5], as one class of stochastic neural networks, were used to study phase transitions [6,7], represent wave functions [8,9], and extract features from physical systems [10,11].Furthermore, variational autoencoders (VAEs) [12] have also been applied to different physical representation learning problems [13].In addition to the application of generative neural networks to physical systems, connections between statistical physics and the theoretical description of certain neural network models also helped to gain insights into their learning dynamics [10,[14][15][16].
Despite these developments, the representational characteristics and limitations of neural networks have only been partially explored.To better understand the representational properties of the aforementioned neural networks, we consider single RBMs and VAEs and study their ability to capture the temperature dependence of physical features in the Ising model.In this way, we are able to quantitatively compare the learned distributions to corresponding theoretical predictions of a wellcharacterized physical system.We train both neural networks with realizations of Ising configurations at different temperatures and identify the temperature of generated samples with a classification network.Our work complements earlier studies that utilized one RBM per temperature to learn the distribution of Ising configurations with a few dozen spins [14,17,18].The system sizes we consider to train single machines for all temperatures are about one order of magnitude larger than those in previous studies.Furthermore, we consider different measures to monitor the learning progress, and quantitatively compare Ising samples that are generated by RBMs with the ones of VAEs.To examine the quality of the generated Ising configurations, we measure their magnetization, energy, and spin-spin correlations.
Our paper proceeds as follows.In Sec.II, we give a brief introduction to generative models and summarize the concepts that are necessary to train RBMs and VAEs.To study the learning progress of both models, we provide an overview of different monitoring techniques in Sec.III.In Sec.IV, we apply RBMs and VAEs to learn the distribution of Ising spin configurations for different temperatures.We conclude our study and discuss our results in Sec.V.

II. GENERATIVE MODELS
Generative models are used to approximate the distribution of a dataset D = {x 1 , . . ., x m } whose entities are samples x i ∼ p drawn from a distribution p (1 ≤ i ≤ m).Usually, the distribution p is unknown and thus approximated by a generative model distribution p(θ), where θ is a corresponding parameter set.We can think of generative models as neural networks with θ describing the underlying weights and activation function parameters.Once trained, generative models are used to generate new samples similar to the ones of the considered dataset D [10].To quantitatively study the representational power of generative models, we consider two specific architectures: (i) RBMs and (ii) VAEs.We introduce the basic concepts behind RBMs and VAEs in Secs.II A and II B. In this example, the respective layers consist of six visible units {vi} i∈{1,...,6} and four hidden units {hi} i∈{1,...,4} .The network structure underlying an RBM is bipartite.

A. Restricted Boltzmann machine
RBMs are a particular type of stochastic neural networks and were first introduced by Smolensky in 1986 under the name Harmonium [19].In the early 2000s, Hinton developed some efficient training algorithms that made it possible to apply RBMs in different learning tasks [5,20].The network structure of an RBM is composed of a visible and a hidden layer.In contrast to Boltzmann machines [3], no intra-layer connections are present in RBMs.We show an illustration of a bipartite RBM network in Fig. 1.
Mathematically, we describe the visible layer by a vec- T , which consists of m visible units.Each visible unit represents one element of the dataset D. Similarly, we represent the hidden layer by a vector h = (h 1 , . . ., h n ) T , which is composed of n hidden units.To model the distribution of a certain dataset, hidden units serve as additional degrees of freedom to capture the complex interaction between the original variables.Visible and hidden units are binary (i.e., v i ∈ {0, 1} (1 ≤ i ≤ m) and h j ∈ {0, 1} (1 ≤ j ≤ n)).We note that there also exist other formulations with continuous degrees of freedom [21].In an RBM, connections between units v i and h j are undirected and we describe their weights by a weight matrix W ∈ R m×n with elements w ij ∈ R. Furthermore, we use b ∈ R m and c ∈ R n to account for biases in the visible and hidden layers.According to these definitions, there is an energy associated with every configuration (v, h).The probability of the system to be found in a certain configuration (v, h) is described by the Boltzmann distribution [3,10] where Z = {v,h} e −E(v,h) is the canonical partition function.The sum {v,h} is taken over all possible configurations {v, h}.
We now consider a dataset D that consists of N i.i.d.realizations {x 1 , . . ., x N }.Each realization x k = (x k1 , . . ., x km ) T (1 ≤ k ≤ N ) in turn consists of m binary elements.We associate these elements with the visible units in an RBM.The corresponding log-likelihood function is [22] log L (x, θ) = log p(x 1 , . . ., where the parameter set θ describes weights and biases and the partition function is Based on Eq. ( 6), we perform maximum-likelihood estimation of RBM parameters.We may express the loglikelihood function in terms of the free energy h) is the (clamped) free energy of an RBM whose visible states are clamped to the elements of x k [24].
To train an RBM, we have to find a set of parameters θ that minimizes the difference between the distribution p θ of the machine and the distribution p of the data.Block Gibbs sampling in an RBM.We show an example of an RBM with a visible layer that consists of six visible units and a hidden layer that consists of four hidden units.Because of the bipartite network structure of RBMs, units within one layer can be grouped together and updated in parallel (block Gibbs sampling).Initially, visible units (green) are determined by the dataset.Hidden (red) and visible (blue) units are then updated in an alternating manner.
One possibility is to quantify the dissimilarity of these two distributions in terms of the Kullback-Leibler (KL) divergence (i.e., relative entropy) [3] Furthermore, it is also possible to minimize the negative log-likelihood.This optimization problem can be solved using a stochastic gradient descent algorithm [25], so that the parameters are iteratively updated: where R(D, θ) denotes the loss function (i.e., negative log-likelihood or KL divergence) and η the learning rate.
In the case of an RBM, the derivatives of the loss function are where • indicates the ensemble average over different realizations of the corresponding quantities.According to Eqs. (11) to (13), weights and biases are updated in order to minimize the differences between data and model distributions.The bipartite network structure of RBMs allows to update units within one layer in parallel (see Eqs. ( 4) and ( 5)).We illustrate this so-called block Gibbs sampling process in Fig. 2.After a certain number of updates of the visible and hidden layers, the model distribution p θ equilibrates.It has been shown empirically [5] that a small number of updates may be sufficient to train an RBM according to Eqs. ( 11) to ( 13).This technique is also known as k-contrastive divergence (CD-k), where k denotes the number of Gibbs sampling steps.An alternative to CD-k methods is persistent contrastive divergence (PCD) [26].Instead of using the environment data as initial condition for the sampling process (see Fig. 2), PCD uses the final state of the model from the previous sampling process to initialize the current one.The idea behind PCD is that the model distribution only changes slightly after updating the model parameters θ according to Eq. ( 9) using a small learning rate.For further details on the RBM training process, see Sec.IV A.

B. Variational autoencoder
A variational autoencoder (VAE) is a graphical model for variational inference and consists of an encoder and a decoder network.It belongs to the class of latent variable models, which are based on the assumption that the unknown data distribution p can be described by random vectors z with an appropriate prior in a low-dimensional and unobserved (latent) space.We show an example of a VAE in Fig. 3.
An encoder maps the input data to a representation that is used as input in a decoder.We denote the parameter sets of the encoder and decoder by φ and θ.The encoder is a parametrization of the posterior probability q φ (z|x) of the latent variable z given the data x, whereas the decoder is a parameterization of the likelihood p θ (x|z).In variational inference, the main idea is to approximate the posterior q φ (z|x) by a variational distribution [27].In the context of VAEs, a common choice is to describe the posterior by a factorized Gaussian encoder: where µ and σ 2 are the mean and variance vectors of the model.Next, we want to find the parameter set φ * that minimizes the KL divergence between q φ (z|x) and A VAE is composed of two neural networks: the encoder (yellow) and decoder (green).In this example, each network consists of an input and an output layer (multiple layers are also possible).The dotted black arrow between the two networks represents the reparameterization trick (see Eq. ( 17)).Each unit in the input layer of the decoder requires a corresponding mean µ and variance σ as input.
the actual posterior p(z|x): Since the true posterior p(z|x) is unknown, we cannot solve the optimization problem of Eq. ( 15) directly and instead have to maximize the evidence lower bound (ELBO) [12]: where p(z) = N (0, I) is the latent variable prior.The ELBO is the loss function of a VAE.The first term in Eq. ( 16) may be interpreted as a reconstruction error, because maximizing it makes the output of the decoder more similar to the input of the encoder.The second term ensures that the latent representation is Gaussian such that data points with similar features have a similar Gaussian representation.
To summarize, a VAE is based on an encoder network with parameters (µ, σ) of the latent Gaussian representation q φ (z|x) of p(z|x).The second neural network in a VAE is a decoder that uses samples of the Gaussian q φ (z|x) as input to generate new samples according to a distribution p θ (x|z) (see Fig. 3).We estimate all parameters by maximizing the ELBO using backpropagation [12] and therefore need a deterministic and differentiable map between the output of the encoder and the input of the decoder w.r.t.µ and σ.To calculate the corresponding derivatives in the ELBO, we need to express the random variable z as some differentiable and invertible transformation g of an other auxiliary random variable (i.e., z = g(µ, σ, )).We employ the so-called reparameterization trick [12] that uses g(µ, σ, where ∼ N (0, I) and is the element-wise product.
To learn the distributions of physical features in the Ising model for different temperatures (i.e., classes), we use a conditional VAE (cVAE) [28] (see Sec. IV B for further details).The advantage of cVAEs over standard VAEs is that it is possible to condition the encoder and decoder on the label of the data.We use l to denote the total number of classes and c ∈ {1, . . ., l} is the label of a certain class.The data in a certain class is then distributed according to p(z|c) and the model distributions of the cVAE are given by q φ (z|x) and p θ (x|z, c).These modified distributions are used in the ELBO of a cVAE.From a practical point of view, we have to provide extra dimensions in the input layers of the decoder of cVAEs to account for the label of a certain class.

C. Comparison between the two models
In Secs.II A and II B, we outlined the basic ideas behind RBMs and VAEs as generative neural networks that are able to extract and learn features of a dataset.Both models have in common that they create a lowdimensional representation of a given dataset and restore them minimizing the error in the reconstruction (see Eqs. ( 8) and ( 16)).
One major difference between the two models is that the compression and decompression is performed by the same network in the case of an RBM.For a VAE, however, two networks (encoder and decoder) are used to perform these tasks.A second important difference is that the latent distribution is a product of Bernoulli distributions for RBMs and a product of Gaussians for VAEs.Therefore, the representational power of RBMs is restricted to 2 m , where m is the number of hidden units.In the case of VAEs, it is R d where d is the dimension of the latent space.The greater representational power of a VAE may be an advantage, but it could also result in overfitting the data.

III. MONITORING LEARNING
To monitor the learning progress of RBMs and VAEs, we can keep track of the reconstruction error (i.e., the squared Euclidean distance between the original data and its reconstruction) and binary cross entropy (BCE) [29] (18) where x i ∈ {0, 1} and p i (x) is the reconstruction probability of bit i.In the following subsections, we describe additional methods that allow us to monitor approximations of log-likelihood (see Eq. ( 6)) and KL divergence (see Eq. ( 8)).Approximations are important since it is computationally too expensive to exactly determine loglikelihood and KL divergence.In Ref. [17], the partition function of RBMs has been computed directly since the considered system sizes were relatively small.

A. Pseudo log-likelihood
One possibility to monitor the learning progress of an RBM is the so-called pseudo log-likelihood [30].We consider a general approximation of an m-dimensional parametric distribution: where x −i is the set of all variables apart form x i .We now take the logarithm of P(x; θ) and obtain the pseudo log-likelihood Based on Eq. ( 20), we conclude that the pseudo loglikelihood is the sum of the log-probabilities of each x i conditioned on all other states x i .The summation over all states x i can be computationally expensive, so we approximate log P(x; θ) by uniformly at random drawing samples i ∈ {1, 2, . . ., m} and approximating P by P such that: For RBMs, the approximated pseudo log-likelihood is where x corresponds to the vector x but with a flipped i-th component.Equation ( 22) is a consequence of the following identity:

B. Estimating KL divergence
In most unsupervised learning problems, except some synthetic problems such as Gaussian mixture models [31], we do not know the distribution underlying a certain dataset.We are only given samples and can therefore not directly determine the KL divergence (see Eq. ( 8)).Instead, we have to use some alternative methods.According to Ref. [32], we use the nearest-neighbor (NN) estimation of the KL divergence.Given two continuous distributions p and q with N and Ñ i.i.d.m-dimensional samples {x 1 , . . ., x N } and {y 1 , . . ., y Ñ }, we want to estimate D KL (p||q).For two consistent estimators [33] p and q , the KL divergence can be approximated by We construct the consistent estimators as follows.Let ρ k (x i ) be the Jaccard distance [34] between x i and its k-NN in the subset {x j } for j = i and ν k (x i ) the Jaccard distance between x i and its k-NN in the set {y l }.We obtain the KL divergence estimate In the context of generative models this means that we can calculate the KL divergence in terms of the Jaccard distances ρ k (x i ) and ν k (x i ).

IV. LEARNING THE ISING MODEL
We now use the generative models of Sec.II to learn the distribution of Ising configurations at different temperatures.This enables us to examine the ability of RBMs and cVAEs to capture the temperature-dependence of physical features such as energy, magnetization, and spinspin correlations.We first focus on the training of RBMs in Sec.IV A and in Sec.IV B we describe the training of cVAEs.However, before focusing on the training, we give a brief overview over some relevant properties of the Ising model.
The Ising model is a mathematical model of magnetism and describes the interactions of spins σ i ∈ {−1, 1} (1 ≤ i ≤ m) according to the Hamiltonian [35] H where i, j denotes the summation over nearest neighbors and H is an external magnetic field.We consider the ferromagnetic case with J = 1 on a twodimensional lattice with 32 × 32 sites.In an infinite twodimensional lattice, the Ising model exhibits a secondorder phase transition at 2.269 [35,36].To train RBMs and cVAEs, we generate 20 × 10 4 realizations of spin configurations at temperatures T ∈ {1.5, 2, 2.5, 2.75, 3, 4} using the M(RT) 2 algorithm [35,37].In previous studies, RBMs were applied to smaller systems with only about 100 spins to learn the distribution of spin configurations with one RBM per temperature (see Refs. [14,17] for further details).
We also consider the training of one RBM per temperature and compare the results to the samples generated with an RBM and a cVAE that were trained for all temperatures.If RBMs and cVAEs are able to learn the distribution of Ising configurations {σ} at a certain temperature, the model-generated samples should have the same (spontaneous) magnetization, energy, and spin-spin correlations as the M(RT) 2 samples.We compute the spontaneous magnetization according to Furthermore, we determine the spin-spin correlations between two spins σ i and σ j at positions r i and r j : We use translational-invariance and rewrite Eq. (28) as where r = |r i −r j | is the distance between spins σ i and σ j .
The correlation function G(r, T ) quantifies the influence of one spin on another at distance r.

A. Training RBMs
We consider all 20 × 10 4 samples for T ∈ {1.50, 2.0, 2.5, 2.75, 3.0, 4.0} and first train one RBM per temperature using the Adam optimizer [38] with a 5step PCD approach (see Sec. II A).For all machines, we use the same network structure and training parameters and set the number of hidden units to n = 900.The number of visible units is equal to the number of spins (i.e., m = 32 2 = 1024).We initialize each Markov chain with uniformly at random distributed binary states {0, 1}, train over 200 epochs, and repeat this procedure 10 times.We set the learning rate to η = 10 −4 and considered minibatches of size 128.In addition, we encouraged sparsity in the hidden layer by applying L1 regularization with a regularization coefficient of λ = 10 −4 .That is, we consider an additional contribution λ||W || 1 in the log-likelihood and use log L (x, θ, λ) = log L (x, θ) + λ||W || 1 (30) instead of Eq. ( 6).We initialize all weights in W according to a Gaussian distribution with the glorot initializer [39] and biases b and c with uniformly distributed numbers in the interval [0, 0.1].In Fig. 4, we show the evolution of magnetization, energy, and correlations during the training of an RBM for Ising samples at T = 2.75.We observe that the mean magnetization is captured well in the beginning of the training process, whereas energy and spin correlations require training over several epochs to match the training data.During training, the initially small variations in magnetization become larger.
In addition to using one RBM per temperature, we also train one RBM for all temperatures.We use the same samples and training parameters as before.However, we add additional visible units to encode the temperature information of each training sample.For the training of this RBM, we start from a random initial configuration, train over 400 epochs, and repeat this procedure 10 times.
To monitor the training, we tested the considered models every 20 epochs on 10% of the data by monitoring reconstruction error, binary cross entropy, pseudo log-likelihood, and KL divergence and its inverse (see  In the top panel, we separately trained one RBM per temperature, whereas we used a single RBM for all temperatures in the middle panel.In the bottom panel, we show the behavior of M (T ), E(T ), and G(r, T ) for samples that are generated with a convolutional cVAE that was trained for all temperatures.Error bars are smaller than the markers.
Sec. III).We compute reconstruction error and binary cross entropy based on a single Gibbs sampling step.For the KL divergence, we generated 12 × 10 3 samples by performing 10 3 sampling steps and keeping the last 10 2 samples.We repeated this procedure 20 times for each temperature and averaged over minibatches of 256 sam-ples to have better estimators.

B. Training cVAEs
We train one cVAE for all temperatures and consider an encoder that consists of three convolutional layers with 32, 64, and 64 kernels (or filters) and a rectified linear unit (ReLU) as activation function.Each convolutional layer is followed by a maxpool layer to downsample the pictures.In a maxpool layer, only the maximum values of subregions of an initial representation are considered.To regularize the cVAE, we use batchNormalization [40] and a dropout rate of 0.5 [41].The last layer is a fully connected flat layer with 400 units (200 mean and 200 variance) to represent the 200 dimensional Gaussian latent variable z.
For the decoder, we use an input layer that consists of 200 units to represent the latent variable z and concatenate it with the additional temperature labels.We upsample their dimension with a fully connected layer of 2048 units that are reshaped to be fed into three deconvolutional layers.The number of filters in these deconvolutional layers are 64, 64, and 32 and each deconvolutional layer is followed by an upsampling layer to upsample the dimension of the pictures.The last layer of the encoder is a deconvolutional layer with one single filter to have a single sample as output.We train the complete architecture over 10 3 epochs using the Adam optimizer [38] and a learning rate of η = 10 −4 .During training, we keep track of the ELBO (see Eq. ( 16)) and the KL divergence and its inverse (see Sec. III) for 10% of the data.
We also consider non-convolutional VAE architectures in App. A. However, in the absence of convolutional layers, we found that some physical features are not wellcaptured anymore (see App.A for further details).

C. Training classifier
In the case of single RBMs and cVAEs that we train for all temperatures, we have to use a classifier to determine the temperature of the generated samples.We consider a classifier network that consists of two convolutional layers with respectively 32 and 64 filters and four fully connected layers of [256,128,64,32] units with a ReLU activation function and stride 2 convolution.Similar to Eq. (30), we regularize all layers using a L2 regularizer with a regularization coefficient of 0.01.The last layer is a softmax fully connected layer that outputs the probability of a sample to belong to each temperature class and we determine the sample temperature by identifying the highest probability class.We trained the model for 100 epochs with the Adam optimizer [38] and a learning rate of 10 −4 .We monitor both the test loss (i.e., categorical cross entropy) and accuracy.With this model, we achieve a test accuracy of 89% for the considered six temperature classes.

D. Generating Ising samples
After training both generative neural networks with 20 × 10 4 realizations of Ising configurations at different temperatures, we can now study the ability of RBMs and cVAEs to capture physical features of the Ising model.To generate samples from an RBM, we start 200 Markov chains from random initial configurations whose mean corresponds to the mean of spins in the data at the desired temperature.This specific initialization together with the encoding of corresponding temperature labels helps sampling from single RBMs that were trained for all temperatures.We perform 10 3 sampling steps, keep the last 10 2 samples, and repeat this procedure 2 × 10 2 times for each temperature.The total number of samples is therefore 20 × 10 4 .For the cVAE, we sample from the decoder using 20 × 10 3 random normal 200-dimensional vectors for each temperature.We concatenate these vectors with the respective temperature labels and use them as input of the decoder.For the single RBM and cVAE that we trained for all temperatures, we use the classifier (see Sec. IV C) to identify the temperature of the newly generated patterns.We show the corresponding distribution of samples over different temperature classes in Fig. 6.For RBMs and cVAEs, the relative frequency of samples with temperature T > 2.5 is larger than for lower temperatures.In particular, the results of Fig. 6 suggest that the considered cVAE has difficulties to generate samples for T ≈ T c ≈ 2.269.This is not the case for the RBM samples in Fig. 6.
Next, we determine magnetization, energy, and correlation functions of the generated samples at different temperatures (see Fig. 5).In the top panel of Fig. 5, we observe that RBMs that were trained for single temperatures are able to capture the temperature dependence of magnetization, energy, and spin correlations.In the case of single RBMs and convolutional cVAEs that were trained for all temperatures, we also find that the physical features are captured well.Interestingly, for cVAEs with a non-convolutional architecture, we find that spin correlation effects are not properly learned anymore.This results in larger deviations in the energy and correlation function from the original data (see App.A for further details).In Fig. 7, we show some snapshots of Ising configurations in the original data and compare them to spin configurations, which we generated with a single RBM and convolutional cVAE.To better visualize the differences between original data and modelgenerated Ising samples, we show two-dimensional scat-ter plots of energy and magnetization in App.B.

V. CONCLUSIONS AND DISCUSSION
In comparison with other generative neural networks, the training of RBMs is challenging and computationally expensive [42].In addition, conventional RBMs can only handle binary data and generalized architectures have to used to describe continuous input variables [21].For these reasons and the good performance of convolutional models, RBM-based representation learning is being more and more replaced by models like VAEs and generative adversarial networks (GANs) [43,44].However, performance assessments of RBMs and VAEs are mainly based on a few common test data sets (e.g., MNIST) that are frequently used in the deep learning community [10].This approach can only provide limited insights into the representational properties of neural network models.
To better understand the representational characteristics of RBMs and VAEs, we studied the ability of these models to learn the distribution of physical features of Ising configurations at different temperatures.In this way, we were able to quantitatively compare the distributions learned by RBMs and VAEs in terms of features of an exactly solvable physical system.We find that physical features are well-captured by the considered RBM and VAE models.However, samples generated by the employed VAE are less evenly distributed across temperatures than is the case for samples that we generated with an RBM.In particular close to the critical point of the Ising model, the considered VAEs had difficulties to generate corresponding samples.Furthermore, we needed to use a convolutational architecure for VAEs to capture spin correlations and energy.Therefore, in the context of physical feature extraction, our results suggest that RBMs are still able to compete with the more recently developed VAEs.In future studies, it will be useful to study the representational characteristics of RBMs, VAEs, and other generative neural networks for other physical systems.

Figure 2 .
Figure2.Block Gibbs sampling in an RBM.We show an example of an RBM with a visible layer that consists of six visible units and a hidden layer that consists of four hidden units.Because of the bipartite network structure of RBMs, units within one layer can be grouped together and updated in parallel (block Gibbs sampling).Initially, visible units (green) are determined by the dataset.Hidden (red) and visible (blue) units are then updated in an alternating manner.

Figure 3 .
Figure 3. Variational autoencoder.A VAE is composed of two neural networks: the encoder (yellow) and decoder (green).In this example, each network consists of an input and an output layer (multiple layers are also possible).The dotted black arrow between the two networks represents the reparameterization trick (see Eq. (17)).Each unit in the input layer of the decoder requires a corresponding mean µ and variance σ as input.

1 Figure 4 .
Figure 4. Evolution of physical features during training.We show the evolution of the magnetization M (T ), energy E(T ), and correlation function G(r, T ) for the training of a single RBM at T = 2.75.Dashed lines indicate the mean of the corresponding M(RT) 2 samples and different colors in the plot of G(r, T ) represent different radii r.

1 Figure 5 .
Figure 5. Physical features in RBM and convolutional VAE Ising samples.We use RBMs and convolutional cVAEs to generate 20 × 10 4 samples of Ising configurations with 32 × 32 spins for temperatures T ∈ {1.5, 2, 2.5, 2.75, 3, 4}.We show the magnetization M (T ), energy E(T ), and correlation function G(r, T ) for neural network and corresponding M(RT) 2 samples.In the top panel, we separately trained one RBM per temperature, whereas we used a single RBM for all temperatures in the middle panel.In the bottom panel, we show the behavior of M (T ), E(T ), and G(r, T ) for samples that are generated with a convolutional cVAE that was trained for all temperatures.Error bars are smaller than the markers.

Figure 6 .
Figure 6.Relative frequencies of samples at different temperatures.We show the relative frequencies of samples that are obtained by starting trained RBMs (top) and convoluational cVAEs (bottom) from random initial configurations.We generated 20 × 10 4 samples for each temperature.