Casimir effect with machine learning

Vacuum fluctuations of quantum fields between physical objects depend on the shapes, positions, and internal composition of the latter. For objects of arbitrary shapes, even made from idealized materials, the calculation of the associated zero-point (Casimir) energy is an analytically intractable challenge. We propose a new numerical approach to this problem based on machine-learning techniques and illustrate the effectiveness of the method in a (2+1) dimensional scalar field theory. The Casimir energy is first calculated numerically using a Monte-Carlo algorithm for a set of the Dirichlet boundaries of various shapes. Then, a neural network is trained to compute this energy given the Dirichlet domain, treating the latter as black-and-white pixelated images. We show that after the learning phase, the neural network is able to quickly predict the Casimir energy for new boundaries of general shapes with reasonable accuracy.

The presence of physical bodies in a quantum vacuum affects the spectrum of zero-point fluctuations of quantum fields and leads to the appearance of the forces acting on the bodies.This phenomenon, known as "the Casimir effect", was first predicted in 1948 by Hendrik Casimir, who has shown that two strictly parallel neutral metallic plates should attract each other due to quantum fluctuations of the electromagnetic field [1].The Casimir phenomenon generalizes the van der Waals interactions between neutral bodies [2] and plays an essential role in microelectromechanical and microfluidic systems at submillimeter scales, where the zero-point forces induced by the quantum fluctuations of electromagnetic fields become significant [3][4][5].
Geometrical shapes and material composition of physical bodies affect the Casimir forces significantly.Accurate analytical calculations work for a limited set of relatively simple geometries, where the spectrum of vacuum fluctuations is precisely known.The approximate proximity-force calculations [6] may access perturbations around known configurations including near-planar geometries.In general case, the Casimir forces and associated Casimir energies are computed with the help of various numerical and semi-analytical techniques [7], which include methods of the scattering theory [8,9], factorization [10] and discretization [11] approximations, worldline approaches [12], and methods of lattice field theories [13,14].
In our paper, we propose to tackle the complicate problem of calculation of the Casimir energy in general geometries using the Machine learning (ML) approach.The ML technique is a collection of powerful programming tools that allow the computer to find how to perform a task without being explicitly programmed (see [20,21] for physicist reviews).In recent years, the ML has revolutionized many fields of engineering and sciences thanks to several breakthroughs, in particular in the design of neural networks.While the neural networks may be slow in training, their predictions are usually coming very fast.Neural networks find increasingly important implementation in the successful investigation of many complex physical systems that involve a large number of degrees of freedom.The non-exhaustive list of the relevant examples includes open quantum systems with high-dimensional Hilbert spaces [23,24], topological phases in the context of topological band insulators [25] and field theories [26,27], as well as phase structure of many-body, strongly-correlated and field systems in general [28][29][30][31][32][33][34][35][36][37][38][39].
We will use the so-called supervised learning procedure, which -in a very general sense -consists of establishing a map from some inputs to some outputs by training a neural network on a broad set of known examples.In our case, the inputs are the boundaries imposed on quantum fields, and the outputs are the Casimir energy of the quantum fields in the space with these boundaries.Technically, we illustrate the effectiveness of the method in a field theory with the simplest, Dirichlet boundary conditions.We describe the Dirichlet boundary geometries as black and white images, and then we employ standard neural-network techniques used in the computer vision [49] in order to "recognize" the correct Casimir energy for a particular shape of boundaries.
One of the simplest and, at the same time, practically relevant realizations of the Casimir effect appears in photodynamics, the theory of a single Abelian gauge field a µ described by the Lagrangian: The photodynamics respects U (1) gauge symmetry, a µ (x) → a µ (x) + ∂ µ ω(x), and possesses, in (d + 1) spacetime dimensions, d − 1 physical degrees of freedom.
The Lagrangian (1) describes a very simple system of a single non-interacting vector field.However, the nontriviality of the Casimir effect comes from the boundary conditions, namely, from a nontrivial dependence of the (regularized) energy of the quantum fluctuations on the shape of the boundaries of a physical object immersed into the vacuum of the photons (1).It is the shapedependence which makes the problem difficult.
We address the problem of the shape-dependence of the Casimir energy using a machine-learning approach.Since we are concerned with the proof-of-principle result, it is sufficient to work in two spatial dimensions which is the lowest spatial dimension appropriate for our purposes.In two spatial dimensions, the boundaries are, generally, one-dimensional curves and isolated points, which are easy to visualize and subsequently treat.In one spatial dimension, the Casimir is too trivial as the Casimir easy depends on positions of points as well as on a global geometry of the spacetime manifold.In three and higher dimensions, on the contrary, the energy is a function of more complicated shapes of (hyper)surfaces.
In two spatial dimensions, the photon has one physical degree of freedom.Restricting ourselves to an idealized case, one could consider an object made of a perfect electric conductor.At its boundary, the normal component of the magnetic field and the tangential component(s) of the electric field are vanishing.In two spatial dimensions, this boundary condition is expressed via the single equation: where n µ (x) is a vector normal to the boundary at the point x of the (piecewise) one-dimensional boundary S, and f αβ is the field strength tensor of the photon field given in Eq. (1).In the geometry of two parallel static straight wires separated by the distance R, the vacuum fluctuations of the Abelian gauge field lead to an attractive potential between the wires made of an ideal metal [43]: where The problem may be simplified even further by considering the model of a free real-valued field φ = φ(x) with the following Lagrangian: Similarly to photodynamics in two spatial dimensions, this model has also one degree of freedom.Instead of (2), the boundary may be set by a simpler, Dirichlet boundary condition: For two parallel Dirichlet-type wires we naturally recover the expression (3) for the Casimir energy.
In two spatial dimensions, the boundaries may be represented by lines of arbitrary shapes.Thus, any configuration of boundaries may be treated as a pixelated blackand-white image, in which the white pixels correspond to the free unoccupied space while the black pixels encode the positions of the boundaries.The scalar field freely fluctuates in the white spaces and vanishes at the black pixels at which the Dirichlet condition ( 5) is imposed.
We will consider the pixel-thin physical objects in order to avoid the trivial case of the thicker objects, in the interior of which the Casimir energy equals, by definition, to zero.Thus, it is only boundaries that matter.In addition, we consider the static objects for which the particle creation is absent and the Casimir energy is a time-independent quantity.
In order to elucidate this question we use the firstprinciple lattice simulations.We numerically calculate the Casimir energy of the wires of various shapes using the first-principles methods of lattice gauge theory developed earlier in Refs.[14,40,41].The discretized version of the scalar gauge theory ( 4) is given by the partition function where the integration goes over the real-valued scalar field φ x ∈ R defined on the sites x ≡ (x 1 , x 2 , x 3 ) of the Euclidean cubic lattice L 3 s with periodic boundary conditions in all three directions.We take 0 x µ L − 1 where µ = 1, 2, 3 labels the directions.Two coordinates, x 1 and x 2 , correspond to the spatial dimensions while the third coordinate, x 3 , represents the Wick-rotated imaginary time.The temporal and spatial dimensions of the lattice are of the same length so that the simulations are carried out at zero temperature.
In the lattice action of the d+1 dimensional model (4), the derivatives are represented by the finite differences.
The lattice generalization of the Dirichlet boundary condition ( 5) is straightforward.
In the model in continuum, the energy of vacuum fluctuations of the scalar field is related to a local expectation value of its energy density, After a Wick rotation to a Euclidean space-time with the time coordinate x 3 = it, the energy density (8) trans-forms to On the lattice, we use a symmetrized version of the discretized expression (9): where The regularized energy density is formally given by where the subscripts "0" and "S" indicate that the expectation value is taken, respectively, in the absence and in the presence of the boundaries S of the physical objects.Due to the normalization the ultraviolet divergences cancel in Eq. ( 11) so that E S (x) provides us with a local finite quantity, the Casimir energy density, which is equal to a change in the energy density of the vacuum fluctuations due to the presence of the boundaries.The surfaces S are the world-surfaces of the static boundaries, which do not evolve with the imaginary time x 3 .
In order to diversify our efforts, we also considered a technically similar and equally difficult problem of calculation of the mean total action S in the two-dimensional Euclidean model given, formally, by Eq. ( 7) with d = 1.Although this quantity has no straightforward physical interpretation, its calculation is as difficult as the calculation of the Casimir energy in 2+1 dimensional model.Below we will apply the very same ML technique to this two-dimensional model.To shorten notations, we will call these models below as 3d and 2d, respectively.
In our numerical simulations we use the methods successfully adopted for studies of the Casimir forces in Abelian gauge theories in Refs.[14,40,41].To calculate the Casimir energy in the (2+1) dimensional model, we discretize each geometry of the boundaries at 255 3 lattice and then generate 2 × 10 5 scalar field configurations using a Hybrid Monte Carlo algorithm which combines standard Monte-Carlo methods [44] with the molecular dynamics approach.The latter incorporates a secondorder minimum norm integrator [45].We skip first 10 5 configurations to assure their thermalisation and subsequent 10 5 configurations for the statistical analysis.We employ the same techniques to calculate the mean action in the two-dimensional model, using 10 6 configurations at 256 2 lattices.
The neural network is standard in the context of image processing (Figure 1).It is made of four 2d convolutional layers with a kernel of size (3,3) and with 32, 64, 128 and 256 filters respectively.Each layer is followed by 1) a batch normalization layer with momentum 0.9, 2) a leaky ReLU activation layer with α = 0.3, and 3) a max pooling layer with size (4, 4).
The last pooling operation is global in order to collapse the spatial dimensions to a single number and it is followed by a dropout layer with probability 0.5.This architecture allows the input lattice to be of any size.Finally, a dense layer with a single unit without activation is added to output the Casimir energy.There is a L 2 -regularization for all weights.The gradient descent is performed with the Adam algorithm with the performance measured by the root-mean-square error, using a batch size of 32 and early stopping (the maximum number of epochs is fixed to 200, in practice it requires around 150 before stopping).Neural networks work best when all variables have similar scales: the output (energy) is normalized (subtraction of the mean and scaling to unit variance) and batch normalization is used between the intermediate layers.Total, the neural network has circa 390k parameters.All these ingredients are standard [20,49] and aim at making the learning faster and preventing overfitting and underfitting (improve generalization).The code is written using Keras, an open-source neural-network library written in Python [48].
We have randomly generated a few thousands of thin boundaries, which included closed non-self-intersecting lines (representing deformed circles) and, separately, two quasi-parallel nonintersecting lines (symbolizing corrugated plates).To sample different shapes and size scales, we generated the curves in a few independent runs so that the general distribution the dimensions of the curves does not correspond to a Gaussian.The number of samples for the different datasets is as follows: • 2d, (256, 256): 3000 deformed circles, 3000 lines; • 3d, (255, 255): 2000 deformed circles, 2000 lines; • 2d, (512, 512): 5000 deformed circles.
The neural network is trained for each dataset separately.In each case, the dataset is split in three sets: 80% for training, 10% for validation (to tune the parameters of the network) and 10% for testing.In 3d, the training takes circa 5 min for 800 samples (running on a GPU GeForce GTX 1080), while predicting takes circa 5 ms for 100 samples.For comparison, Monte Carlo takes 3.1 hours for a single sample on a GPU Tesla K40.
The inset histograms in Fig. 2 characterize the statistical features of the predictive power of the ML algorithm.The histograms show, in a statistical manner, the number of the deformed circles with given range of the Casimir energies in 3d (the mean action in 2d) obtained with the help of the Monte-Carlo calculations ("true") as compared to the predicted by the neural network ("pred.").The errors are summarized in Table I.For the majority of samples, the relative errors are small and the neural    network reproduces well the MC result.The largest errors are found for very small curves, as it can be expected (the image resolution is not sufficient for the neural network).The learning curves represent the evolution of the root-mean-square error (loss) in terms of the number of samples used for training the neural network.The validation data corresponds to all the data not used for training.For large training sets, the flattening of both curves indicate that there is enough samples for train-  II).
ing the network, the quasi-absence of gap between them shows that there is no overfitting, and the overall low values of the losses signals the absence of underfitting.Together, this shows that the architecture of the network is well adapted to the task.We also demonstrate the success of the method in Table II for a set of particular examples, visualized and labeled in the insets of Fig. 2. It is interesting to notice that in most cases the neural network gives the prediction very close to the mean actual value, which falls well within the errors both at Monte-Carlo and Machine-Learning sides.This fact, most probably, highlights a (cautionary) overestimation of the errors provided by the algorithms at the both sides.
We got very similar results for the learning curves, the statistical distribution and the magnitude of errors, for the set of quasi-parallel lines, with typical examples visualized in the right panel of Fig. 1   presented in Table I.
The predictive power of the ML algorithm depends on the size of the curve in dimensionless units (pixels).It seems that both for the coarser (256 2 ) and the finergraded (512 2 ) lattices, there is a common scale L (in pixels) below which the neural network does not work well (examples of these worst configurations are visualized in Fig. 3 with the data shown in Table III of Supplementary Materials).This observation is naturally consistent with the expected property that for a fixed physical size of the curve, a finer discretization gives better results.
In our article, we demonstrated that the trained neural networks provide us with a quick and accurate tool for prediction of the zero-point energies of the physicalthough idealized, in our exploratory approach -bodies.The methods are both versatile and universal as we successfully applied them in two physical setups (in 2d and 3d) and for different types of boundaries (the deformed circles and the corrugated lines).The machine learning techniques may open the door to designing geometries with requested characteristics of the vacuum forces.
The authors are grateful to A. N. Chernodub (Grammarly) for useful comments.The numerical simulations were performed at the computing cluster Vostok-1 of Far Eastern Federal University.The work was supported by a grant of the Russian Foundation for Basic Research No. 18-02-40121 mega.H.E. was supported by a Carl Friedrich von Siemens Research Fellowship of the Alexander von Humboldt Foundation during most of this project.

Supplementary materials
For the overwhelming majority of the studied boundary geometries, the trained neural network gives very good results.However, there is a small subset of the configurations for which our method does not work.In the first part of this supplementary material, we provide some examples where the neural network gives the worst results in terms of relative errors.In Table III we  The worst configurations A, . . ., F are visualized in Fig. 3 in yellow/red colors.It turns out that these configurations correspond to relatively small loops, where the effects of the coarse-graining are large.We also visualize, for comparison, one of the good configurations which was already shown in the upper plot of Fig.For completeness, we also show in Fig. 4 the distributions of the MC and ML values and of the ML relative errors (ML minus MC value divided by the MC value).The bad examples described above are responsible for the tails in the error distributions for the deformed circles.Note, however, that there are very few such instances with high errors: for example, in the 2d case with L = 256, out of the total of 300 samples, there is only one sample with error of circa 0.85, one sample with the error about 0. small curves which have a small value of total Casimir energy.As a consequence, the absolute error with respect to the MC result is small, which explains why the distributions of the MC and ML values agree very well (notice that the range of the values of the Casimir energies is quite large).
We also find that the relative errors are much smaller for the case of the corrugated lines.On the other hand, the range of Casimir energies for the lines is more restricted than the one for deformed circles, while the energy values are relatively large.Therefore, even small relative errors can be visible in the value distribution, as it is clearly seen for the 3d case of the corrugated lines in Fig. 4.

FIG. 1 .
FIG. 1. (left) The neural network used to predict the Casimir energy ES for the static boundary S placed at the spatial L × L lattice cross-section of the 3d model.The size of the inputs of each layer is indicated on the left.The same network is employed for the mean action in the 2d model.(right) The examples of the quasi-parallel, corrugated lines used for training and prediction.

1 FIG. 2 .
FIG. 2. Learning curves for (top) the Casimir energy ES in 3d scalar model and (bottom) the mean action S in 2d scalar model for the set of the deformed circles at 255 2 and 256 3 lattices, respectively.The inset histograms confront, statistically, the real vs.predicted distributions of the Casimir energies ES and actions S , for the 3d and 2d sets, respectively.Several examples of the deformed circles are shown as well (described in TableII).
) in the 3d model and the mean action (O = S ) in the 2d model obtained with the first-principle Monte-Carlo (MC) calculation and the Machine Learning (ML) techniques, together with appropriate absolute errors.The numbers N label the deformed circles shown in Fig. 2.
compare the Casimir energy E C obtained with the help of the first-principles Monte Carlo (MC) simulations (configurations A, B and C for the 3d model) and the mean action S (configurations D, E and F for the 2d model) with the corresponding quantities predicted by the Machine Learning (ML) method.71 0.42 -12.53 5.82 E -14.44 0.42 -8.34 6.10 F -41.15 0.43 -28.38 12.78 TABLE III.Worst predictions (in terms of relative error) from the ML for the Casimir energy (O = ES ) in the 3d model and the mean action (O = S ) in the 2d model.The labels N correspond to the deformed circles shown in Fig. 3.

FIG. 3 .
FIG. 3. The examples A, B, . . ., F of the deformed circles for which the neural network makes the worst predictions (given in Table III).For comparison, we plot -keeping the correct scale -the configuration No. 1 from the good examples for the 3d model shown in the upper plot of Fig. 2.

TABLE I .
Relative errors for 3d (for the Casimir energy EC) and 2d (for the mean action S ) compared to the MC result, evaluated for the deformed circles and the quasi-parallel lines.The line 75% gives the third quartile (75% of the errors are below the value), min and max are the minimum and maximal errors.

TABLE II .
and relative errors The Casimir energy (O = ES