Machine Learning of Calabi-Yau Volumes

We employ machine learning techniques to investigate the volume minimum of Sasaki-Einstein base manifolds of non-compact toric Calabi-Yau 3-folds. We find that the minimum volume can be approximated via a second order multiple linear regression on standard topological quantities obtained from the corresponding toric diagram. The approximation improves further after invoking a convolutional neural network with the full toric diagram of the Calabi-Yau 3-folds as the input. We are thereby able to circumvent any minimization procedure that was previously necessary and find an explicit mapping between the minimum volume and the topological quantities of the toric diagram. Under the AdS/CFT correspondence, the minimum volumes of Sasaki-Einstein manifolds correspond to central charges of a class of 4d N=1 superconformal field theories. We therefore find empirical evidence for a function that gives values of central charges without the usual extremization procedure.


I. INTRODUCTION
In recent years machine learning has become a cornerstone for many fields of science and it has been adopted more and more as a valuable toolbox. Machine learning has attracted much interest due to significant theoretical progress and due to increased availability of large amounts of data, computing power (GPUs) and easy to use software implementations of standard machine learning techniques.
Despite these developments, applications of machine learning techniques to mathematical physics have been limited to our knowledge. One of the reasons for this is that machine learning aims to empirically approximate the underlying probability density function of a given dataset. Making use of machine learning to identify hidden structures in datasets, which teach us about new phenomena in string theory and mathematics, has not been systematically considered before.
This work aims to change the status quo and to provide evidence that machine learning can be used to discover hidden structures in large classes of gauge theories that are studied in theoretical physics as well as large classes of geometries that are studied in mathematics. Importantly, we illustrate that machine learning does not just provide an approximation of known functional relationships between physically and mathematically significant quantities, but also leads to discoveries of new functional relationships.
In particular, we concentrate on a class of 4d N = 1 supersymmetric gauge theories that live on the worldvolume of a stack of D3-branes probing toric Calabi-Yau 3-folds, characterized by convex lattice polygons known as toric diagrams [1][2][3][4]. These theories are expected to flow at low energies to a superconformal fixed point.
From a machine learning perspective, this work studies the minimum volumes of Sasaki-Einstein 5-manifolds. These are the base manifolds of the probed toric Calabi-Yau 3-folds. The minimum volume is of particular interest because under the AdS/CFT correspondence, it is expected to be related to the maximized a-function that gives the central charge of the 4d N = 1 superconformal field theory [5][6][7][8][9].
Using a large dataset of toric Calabi-Yau 3-folds, our aim is to train a machine learning model in such a way that it approximates a functional relationship between topological quantities of the toric Calabi-Yau 3-fold and the minimum volume of the Sasaki-Einstein base manifold. Such a functional relation would be of great use because it would circumvent the standard volume minimization procedure and highlight a direct relationship between topological quantities of the toric Calabi-Yau geometries and the central charges of the 4d superconformal field theories.
We ask whether for a given dataset consisting of minimal volumes V min and topological quantities T of the corresponding Calabi-Yau 3-folds, we can approximate from the data a mapping F such that Here, we use machine learning to model F . Mainly, we will consider three different machine learning models. These are a modification of usual linear regression, a convolutional neural network (CNN) and also a combination of both. In detail, a CNN model is a feed-forward neural network which includes additional convolutional layers.
We refer the reader to [10] for a basic introduction on CNN models and [11] for a comprehensive reference list.

II. BACKGROUND
We concentrate on non-compact Calabi-Yau 3-folds X that are realized as affine cones over a complex base X. In particular, we focus on a special subclass of X where the base is a toric variety X(∆), which is defined by a convex lattice polygon ∆ known as the toric diagram. X can also be thought of as the real cone over a compact, smooth Sasaki-Einstein 5-manifold Y , whose metrics have been studied extensively for various classes of toric Calabi-Yau 3-folds. The Kähler metric of X has the form When we talk of Calabi-Yau volumes, we actually refer to the volume function of the Sasaki-Einstein base Y , which takes the form where dµ is the Riemannian measure on the cone X . ω is the Kähler form and is given by where η is a global one-form on Y . Normalized under the volume of S 5 , we denote the volume function of Y as is an algebraic number and is expressed in terms of Reeb vector components b i=1,...,3 , where b 3 = 3 [12,13]. For a given projective variety X, realized as an affine variety in C k , the Hilbert series is the generating function for the dimension of the graded pieces of the coordinate ring C[x 1 , . . . , x k ]/ f i , where f i are the defining polynomials of X. The Hilbert series of X takes the form of a rational function with the expansion, where the ith graded piece X i can be thought of as the number of algebraically independent degree i polynomials on the variety X, with t keeping track of the degree i. When X is a toric variety defined by a convex lattice polygon ∆, the Hilbert series for X(∆) and the corresponding Calabi-Yau cone X can be obtained from the ideal triangulation of the toric diagram ∆, is the corresponding dual web-diagram with normal vectors to each boundary edges of the triangulation. Notice that we have 3-vectors, given that the original toric diagram is on a plane at height 1.
where the index i = 1, . . . , r runs over the unit triangles in the ideal triangulation and j = 1, 2, 3 runs over the boundary edges of each such triangle [12,14]. u i,j is a 3-dimensional outer normal to the edge j of the associated unit triangle i, where t ui,j = 3 a t ui,j (a) a . Note that we are dealing with 3-vectors because the 2-dimensional toric diagram is on a plane at height 1. Using (II.6), the Hilbert series for C 3 /Z 3 , whose toric diagram is shown in Figure 1, can be obtained as follows The volume function can be derived directly from the Hilbert series of X following the limit The leading order in µ picked up by the above limit from the expansion of the Hilbert series was shown in [12,13] to be directly related to the volume of the Sasaki-Einstein base Y = X | r=1 . For the C 3 /Z 3 example, the volume function takes the form , (II.9) where b 3 = 3.

III. VOLUME MINIMIZATION AND THE ADS/CFT CORRESPONDENCE
The worldvolume theory on a stack of D3-branes probing Calabi-Yau 3-folds X is a 4d N = 1 supersymmetric gauge theory. It is expected that these theories flow at low energies to a superconformal fixed point. The superconformal R-charges of the theory are determined by a procedure known as a-maximization [7][8][9], which involves the maximization of the a-charge, a-maximization gives the value of the central charge at the conformal fixed point, which is by the AdS/CFT correspondence related to the volume minimum of the corresponding Sasaki-Einstein 5-manifold [5,6] under where the R-charge R can be expressed in terms of Reeb vector components b i . In other words, computing the minimum volume, is equivalent under (III.11) to computing the maximized value of a(R; Y ), which is the central charge of the 4d N = 1 superconformal field theory.

IV. DATA
Our aim is to train a neural network to compute the volume minimum directly as a function of toric data, circumventing the minimization procedure that has been so far necessary. The available input data for the machine learning models for a given toric Calabi-Yau 3-fold takes the following form where y = 1/V min is the target inverse minimum volume, and f i are the three features with I being the number of internal lattice points, E being the number of perimeter points and V being the number of extremal corner points of the convex lattice polygon representing the toric diagram. Note that 2f 2 −4 is the Euler number of the corresponding toric variety [15]. In addition, we include the toric diagram itself as a square matrix D, consisting of 0, 1 entries, where an We generate a class of toric Calabi-Yau's whose toric diagrams originate from the toric diagram of the orbifold of the conifold of the form C/Z 5 × Z 5 , which is a lattice square with side-length 5. By consecutively cutting corners of this toric diagram, we generate 187,389 distinct toric diagrams. However, this set of toric diagrams exhibits a remaining GL(2, Z) redundancy and hence certain toric diagrams from this set can be related to the same toric Calabi-Yau 3-fold. We therefore remove the GL(2, Z) redundancy and further reduce the number of toric diagrams down to 15,151, which now establishes a set of distinct toric Calabi-Yau 3-folds. Using the integer rounded centroid of the convex lattice polygons, we re-center the toric diagrams. All the 15,151 re-centered toric diagrams then fit into a 7x7 lattice square, which we further embed into a 9x9 lattice square. Accordingly, D for our dataset is a 9x9 integer matrix with entries 0, 1 . In Figure 2, we illustrate the distribution of the extremal vertices of all the 15,151 toric diagrams we use for our analysis.
Using Hilbert series we compute the volume function V (b i ; Y ) for our dataset and minimize them to obtain V min . Given the entire dataset with the minimized volumes, we identify 4 cases where the value of y = 1/V min is much larger than the remaining dataset. In order to keep a non-distorted dataset, we remove these 4 cases ending up with a dataset of size 15,147. We also note that the minimum volumes are algebraic numbers, which can be irrational, and that we round the actual values for the machine learning model to 4 decimal points. This gives us 797 distinct values for the minimum volume under the chosen numerical precision, corresponding to the 15,147 distinct toric Calabi-Yau 3-folds. Finally, for each toric diagram in our set of Calabi-Yau 3-folds, we identify the features f i that lead us to 15,147 input data of the form shown in (IV. 13). An example is illustrated in Figure 3.
As neural networks easily overfit, meaning that they tend to memorize the specific training set instead of learning the underlying hidden structure, we have to be very careful in data preparation and usage in order not to fool ourselves. We here follow the common approach to split the data into an independent train (75%) and test (25%) set, where the machine learning models are only trained on the train set. We do not use an additional verification set, as we will not perform extensive hyperparameter optimization. It is important to note that we have constructed our dataset of 15,147 toric Calabi-Yau 3-folds in such a way that a split will give GL(2, Z) independent train and test sets.

V. MULTIPLE LINEAR REGRESSION
Let us first see how well we can model the data of the three features given in (IV.14) via a simple multiple linear regression, i.e., where ω i denotes the ith weight, and the dataset is given by (y, f ), with f being the features for target y. The weight ω 0 is usually referred to as the bias and can be viewed as the weight of an additional feature fixed to 1. Furthermore, we improve the modeling non-linearly by taking order 2 combinations of the k f = 3 original features, which yields Though one can solve for ω i exactly due to the convexity of the optimization problem, we prefer here to solve iteratively via stochastic gradient descent using the Python package Keras [16] (with Theano [17] backend) and the Nadam optimizer in default settings running for 5000 epochs (with batch size 1000). Note that we will use the same software stacks and settings in the following section. The solution for the training set obtained via gradient descent reads (up to four digits) One should note that the features f (n) are not unique in our dataset. In fact, we have only 645 unique feature combinations for all the toric diagrams in our test dataset. Hence, we expect that the solution Ansatz (V.16) built on the three features models the statistical expectation E(y) (mean). After categorizing the test dataset into 645 categories C i , we take the expectation E(y Ci ) of the datasets in each category. Furthermore, we calculate the largest and smallest y Ci in each category and order the categories according to their E(y Ci ). Then we plot the minimum and maximum of each category as well as F (f Ci ) against E(y Ci ), leading to the plot in Figure 5. Note that the diagonal of the figure corresponds to E(y C ). We observe that the prediction of F (f C ) (red curve) indeed seems to roughly approximate the mean of the y values in each of the categories.
How well does this simple Ansatz actually model the true y, or rather the minimum volume 1/y of interest? We run the test set through the predictor F and calculate the percentage errors (in ×100%) .
We observe a maximum error | max | ∼ 0.132 and where σ(| |) denotes the standard deviation of the distribution of the absolute value of the percentage errors. We also averaged the values over three independent runs. Hence, the expected prediction error of the minimal volume is around 2.2%. This means that the linear regression Ansatz already yields a surprisingly good approximator to the minimal volume. Here we should make a remark regarding the order of feature combinations that was taken in (V.17). Taking order 2 combinations gives a significant improvement in reducing the error in comparison to taking the plain three features, as the extended linear regression model seems to be able to learn better the more extreme volumes at the tails of the distribution of the minimum volumes. Including in addition order 3 combinations does not seem to yield any further improvement but rather seems to lead to a worse result.
In general, in order to improve this method further, we need to introduce additional features which are able to distinguish between the individual members of a class C i . However, instead of hand-crafting new features, we try in the following section the more modern approach to let the approximator F learn the appropriate features itself from the raw data.

VI. WIDE AND CNN
The raw data in its purest form is the toric diagram itself, hence it behooves us to ask if we can learn additional information directly from the toric diagram, in order to minimize further the prediction error. Since we treat the toric diagram as a 9 × 9 matrix viewed as an image, the canonical computer science approach is to invoke a CNN -the main tool for image recognition.
We couple the linear regression, as described in the previous section, with a CNN as follows. The output of the linear regression is added via a single ReLu unit (rectified linear, max(x, 0)) to the outputs of the CNN, i.e., where m i denotes the ith output of the CNN and ω f are the weights of the final layer. We use the ReLu unit for the final output because the minimum volume has to be positive. Note that this kind of setup is also known as a wide and deep model. The CNN is setup as follows. For the input layer we take a 2d convolutional layer consisting of 32 filters (size 3 × 3) and linear activation. The filters are convolved against the input and produce a 2d activation map of the filter. Hence, the layer learns spatially localized features of the inputs. For an illustration of the convolution layers see Figure 9. This is followed by two relatively small fully connected layers (we use sizes 12 and 4) with tanh activation. Hence we have 4 outputs m i in the CNN part. The precise network architecture is not of utmost relevance, as the results appear to be relatively stable against modifications in the number of layers, units in each layer, etc. However, generally smaller networks seem to be preferred with tanh activation functions in the dense layers. Figure  6 illustrates the combined setup.
The complete setup is trained on the train set as in the previous section via stochastic gradient descent minimizing the mean squared error.
Using the trained network, the prediction of the volume minima for the independent test set exhibits the following errors averaged over three independent test runs, Hence, the expected prediction error is below 1%. The maximum observed error reads | max | ∼ 0.20. The distribution of errors for one test run is plotted in Figure 7. We conclude that adding the CNN yields a significant improvement in predictive power. Note that the few larger errors visible in the plot are due to the tails of the minimum volume distribution, which the model is not able to predict extremely well due to the lack of training data available at the extreme values.
Finally, let us consider the case of using just the CNN alone, without being coupled to a linear regression branch. The used CNN is identical to the one above.
We obtain on the test set the individual errors for one test run are plotted in Figure  8.
In general, the machine learning models have greater difficulty in learning the tails of the minimum volume distribution. This might be due to lack of data in this regime (the data distribution restricted to our frame is not uniform). We observe that the combined setup of both linear regression and CNN performs better. This is because linear regression seems to stabilize the prediction of the tail sections of the minimum volume distribution due to its knowledge about the properties of the feature vector classes, which we discussed in the previous section.

VII. SUMMARY AND OUTLOOK
In this work, we have demonstrated that machine learning techniques, in particular neural networks, can be a useful addition to the toolbox of researchers tackling formal questions in mathematics and physics. A necessary condition for using machine learning techniques is that at least some aspect of the question can be translated to a data science problem.
This work studied whether the minimum volume of Sasaki-Einstein base manifolds for toric Calabi-Yau 3folds can be directly computed from topological quantities originating from toric geometry, replacing the usual minimization procedure that is necessary to identify the minimum volume. This question has aspects of a data science problem, i.e., out of known topological data and the corresponding minimal volumes, can we (or rather the machine) learn a mapping between these quantities (that is, find an approximate functional relation)?
The answer seems to be affirmative. Even taking for the machine learning model just a linear combination of order 2 combinations of numbers for different characteristic vertices in the toric diagrams of the Calabi-Yau 3folds, yields already a good universal approximation to the minimal volume of the corresponding Sasaki-Einstein manifolds. In addition, a CNN model, which learns new kinds of features of the toric diagram, further improves predictive power for the volume minimum.
It is surprising to see that the simple setups we are considering are able to predict the minimum volumes relatively well, effectively showing that the minimum vol-ume is encoded in the toric diagrams of the Calabi-Yau 3-folds. This is an indication that the procedure of volume minimization can be avoided. In fact, we show that the volume minimum can explicitly be computed from the toric data, by using an underlying functional relation, which we have approximated in this work.
With this analysis, we have shown a working example of how a machine learning model can identify functional relationships between mathematically and physically interesting quantities in cases where such functional relationships were not known before.
Note that it would be interesting to refine our analysis by increasing the dataset of toric Calabi-Yau's in such a way that the minimum volume distribution is more uniform. Furthermore, averaging over more test runs as well as increasing the rounding precision for irrational values for minimum volumes would be important improvements that we leave for future work.
We believe that there are other suitable problems which can be approached from a data science perspective, similar to what we have done in this work for the minimum volume of toric Calabi-Yau 3-folds. Approaching problems in this way might yield some novel insights, as well as hints to hidden and unexpected relations between physically and mathematically relevant quantities that have not been observed before.
For example, large datasets of both physical and mathematical significance exist in the context of 4d N = 1 theories related to toric Calabi-Yau 3-folds [1][2][3][4], as well as to a recently discovered new class of 2d (0, 2) theories related to toric Calabi-Yau 4-folds [18]. Furthermore, interesting rich datasets exist in relation to so called complete intersection Calabi-Yaus (CICYs) [19] characterized by configuration matrices that can be taken as inputs for machine learning models. Finally, large datasets exist in relation to hyperbolic 3-manifolds related to knots [20], which may exhibit hidden structures that could be discovered using again machine learning techniques. In a future work [21], we hope to shed light on these interesting problems.