Solving Differential Equations with Neural Networks: Applied to the Calculation of Cosmological Phase Transitions

Starting from the observation that artificial neural networks are uniquely suited to solving optimisation problems, and most physics problems can be cast as an optimisation task, we introduce a novel way of finding a numerical solution to wide classes of differential equations. We find our approach to be very flexible and stable without relying on trial solutions, and applicable to ordinary, partial and coupled differential equations. We apply our method to the calculation of tunnelling profiles for cosmological phase transitions, which is a problem of relevance for baryogenesis and stochastic gravitational wave spectra. Comparing our solutions with publicly available codes which use numerical methods optimised for the calculation of tunnelling profiles, we find our approach to provide at least as accurate results as these dedicated differential equation solvers, and for some parameter choices even more accurate and reliable solutions. We point out that this approach of using artificial neural networks to solve equations is viable for any problem that can be cast into the form $\mathcal{F}(\vec{x})=0$, and is thus applicable to various other problems in perturbative and non-perturbative quantum field theory.


INTRODUCTION
A neural network is an algorithm designed to perform an optimisation procedure, where the loss function provides a measure of the performance of the optimisation.Thus, if a physics problem can be cast into the form F( x) = 0, then its solution can be calculated by minimising the loss function of a neural network.While this approach is applicable to any function F, we attempt to apply this observation to the solution of differential equations and to the non-perturbative calculation of tunnelling rates of electroweak phase transitions.
Solving differential equations is a profound problem, relevant for all areas of theoretical physics.For large classes of differential equations, analytic solutions cannot be found.Thus, numerical or approximative methods are needed to solve them.Standard methods to solve differential equations numerically include the Runge-Kutta method, linear multistep methods, finite-element or finite-volume methods, and spectral methods [1].We instead propose a novel approach to solving differential equations using artificial neural networks.
In recent years, machine-learning algorithms have become increasingly popular in extracting correlations in high-dimensional parameter spaces.Even for a small number of dimensions, e.g.n dim ≥ 3, it becomes very difficult to visualise data such that a human can extract correlations to a high degree of accuracy.Machinelearning algorithms, and in particular neural networks, prove to be faster, more precise and allow a parametric improvement of the precision in how well the region of interest is interpolated.As a result, various neural network architectures have been designed, e.g.convolutional neural networks, recurrent neural networks, deep neural networks, etc., to perform various increasingly complex tasks.
In particle physics such tasks include the classification of signal to background events based on event selection cuts [2][3][4][5], or the classification of complex objects such as jets, according to the image their radiation imprints in the detector [6][7][8][9][10][11][12][13].In other applications neural networks are used to regress between data points [14][15][16][17][18] or are trained on a well known sample to identify outliers or anomalies [19,20].However, in all aforementioned applications that can be characterised as classification and regression, the neural network is applied to an output sample, trying to extract information on the parameters that determine the input.In particle physics that would mean to analyse the radiation profile as recorded by a particle detector to learn the parameters of the underlying model, e.g. the Standard Model.Input and output are connected through quantum field theory, i.e. a nontrivial set of differential and integral equations.
We propose to use these powerful artificial neural network algorithms in a different way, namely to directly find solutions to differential equations.We then apply these methods to calculate the solution of the non-perturbative quantum-field-theoretical description of tunnelling processes for electroweak phase transitions.The fast and reliable calculation of tunnelling rates of the electroweak phase transitions within and in extensions of the Standard Model is of importance to assessing if the model allows for a strong first-order phase transition during the evolution of the early Universe.This could explain baryogenesis [21,22] in such a model as the source of matterantimatter asymmetry in the Universe, and further lead to a stochastic gravitational wave signal which could potentially be measured at future gravitational wave experiments [23,24], e.g.eLISA [25].
The Universal Approximation Theorem [26,27] allows us to expect a neural network to perform well in solving complicated mathematical expressions.It states that an artificial neural network containing a single hidden layer can approximate any arbitrarily complex function with enough neurons.We make use of this property by proposing a neural network model where the output of the network alone solves the differential equation, subject to its boundary conditions.In contrast to previous approaches [28][29][30][31][32][33] where the neural network is part of a full trial solution which is fixed to satisfy the boundary conditions, our approach includes the boundary conditions as additional terms in the loss function.The derivatives of the network output with respect to its inputs are calculated and passed to the loss function, and the network is optimised via backpropagation to regress to the solution of the differential equation.The network then gives a fully differentiable function which can be evaluated at any point within the training domain, and in some cases, extrapolated to further points (although we don't explore the extrapolation performance here).
We will begin by describing the method in detail and showcasing how it can be used to solve differential equations of varying complexity, before applying it to the calculation of cosmological phase transitions.

Design of the Network and Optimisation
We consider an artificial feedforward neural network (NN) with n inputs, m outputs and a single hidden layer with k units.The outputs of the network, N m ( x, {w, b}), can be written as, where the activation function g : R k → R k is applied element-wise to each unit, and h and f denote the hidden and final layers, respectively.We use a single neural network with m outputs to predict the solutions to m coupled differential equations, and for the case of one differential equation, we use m = 1.
A set of m coupled jth order differential equations can be expressed in the general form, with boundary or initial conditions imposed on the solutions φ m ( x).Writing the differential equations in such a way allows us to easily convert the problem of finding a solution into an optimisation one.An approximate solution φm ( x) is one which approximately minimises the square of the left-hand side of Eq. ( 2), and thus the analogy can be made to the loss function of a neural network.
In previous approaches [28][29][30][31], φm ( x) is a trial solution composed of two parts: one which satisfies the boundary conditions, and one which is a function of the output of a neural network and vanishes at the boundaries.However, this requires one to choose a special form of the trial solution which is dependent on the boundary conditions.Furthermore, for some configurations of boundary conditions, finding such a trial solution is a very complex task, e.g. in the case of phase transitions.
where the second term represents the sum of the squares of the boundary conditions, defined at the boundaries x b . 1 These can be Dirichlet or Neumann, or they can be initial conditions if defined at the initial part of the domain.
The problem is then to minimise L({w, b}) by optimising the weights and biases in the network, for a given choice of network setup.To calculate the loss, it is necessary to compute the derivatives of the network output with respect to its input.Since each part of the network, including the activation functions, are differentiable, then the derivatives can be obtained analytically.Ref. [31] outlines how to calculate these derivatives.The optimisation can then proceed via backpropagation by further calculating the derivatives of the loss itself with respect to the network parameters.We use the Keras framework [34] with a TensorFlow [35] backend to implement the network and perform the optimisation of the loss function.
As with any neural network, the choice of hyperparameters will have an effect on the performance.For 1 Here, p represents the order of derivative for which the boundary condition is defined, and K is a function on the boundary.For example, for the condition d dx φ(0) = 1 the second term would be our setup, the important parameters are the number of hidden layers, the number of units in each hidden layer, the number of training points x (i) (corresponding to the number of anchors in the discretisation of the domain of the differential equation), the activation function in each hidden layer, the optimisation algorithm, the learning rate, and the number of epochs the network is trained for.Furthermore, a choice must be made for the size of the domain that contains the points that the network is trained on, but this will usually be determined by the problem being solved.In all the examples, we use the Adam optimiser [36] with learning rate reduction on plateau, i.e. when the loss plateaus the learning rate is reduced, and an initial learning rate of 0.01.We find that the network is not susceptible to overfitting-the training points are chosen exactly from the domain that one is trying to find the solution to, and are not subject to statistical fluctuations, so finding a solution for which the loss at every training point is zero would not limit the generalisation of the solution to other points within the domain.Therefore, we use a large number of epochs such that the training loss becomes very small.For all examples we use a conservative number of 5 × 10 4 epochs.Furthermore, we use the entire set of training points in each batch so that the boundary conditions in the loss are included for each update of the network parameters.We also find that in general, a single hidden layer with a small number of units (O (10)) is sufficient to obtain very accurate solutions.
In order to assess and improve the stability and performance in certain cases, there are some additional technical methods which we employ beyond the basic setup.Firstly, the differentiability of the network solution allows us to calculate the differential contribution, F, to the loss across the entire training domain.This shows the degree of accuracy to which each part of the network solution satisfies the differential equation, and can be used for assessing the performance in cases where the analytic solution is not known.Secondly, for coupled differential equations with initial conditions, we find the stability of the solution can be improved by iteratively training on increasing domain sizes.Finally, for the calculation of phase transitions, we employ a two-step training where initially the boundaries are chosen to be the true and false vacua, before the correct boundary conditions are used in the second training.This prevents the network from finding the trivial solution where the field is always in the false vacuum.

Ordinary Differential Equation Examples
To show how well the method can solve ordinary differential equations (ODEs), we apply it to both a first and a second order ODE, which have known analytic solutions.4) and Eq. ( 5), with boundary conditions as outlined in the text.The middle panel shows the numerical difference between the analytic solution and the NN predicted solution for both cases.The lower panel shows the differential contribution F to the loss across the entire domain.
The equations we study are, with the boundary condition φ(0) = 1 in the domain x ∈ [0, 2] and, with boundary conditions φ(0) = 0 and As a simple neural network structure, we choose a single hidden layer of 10 units with sigmoid activation functions, and we discretise the domain into 100 training examples.It is then just a case of passing the differential equations and boundary conditions to the loss function, as described in Eq. ( 3), and proceeding with the optimisation.Fig. 1 shows the results of the neural network output, compared to the analytic solutions of Eq. ( 4) and Eq. ( 5).The middle panel of Fig. 1 shows the absolute numerical difference between the numerical and analytic solutions.This difference can be reduced further by increasing the number of epochs, the number of anchors in the discretisation of the domain, or the number of units in the hidden layer.Thus, the neural network provides handles to consistently improve the numerical accuracy one aims to achieve.
The lower panel of Fig. 1 shows the differential contribution to the loss function, i.e. how much each training example contributes to the loss.As we will describe in the next section, if the solution is not analytically known, this provides a measure to assess whether the found solution is the correct one or if a numerical instability led the network to settle in a local minimum for the loss.

Coupled Differential Equation Example
When discussing the calculation of cosmological phase transitions, we will study the solution of coupled nonlinear differential equations, for which no closed analytic form is known.Here, we will first show that such solutions can be obtained with our approach, for a case where an analytic solution is known.We consider, with boundary conditions, If the boundary conditions are set on one end of the domain, e.g.here at x = 0, it requires an increasingly elaborate network to maintain numerical stability for the solution over a large domain, e.g.where x 1.This is due to small numerical instabilities during backpropagation because of the complexity of the loss hypersurface.If such numerical instability leads the network to choose a path that is in close proximity to the true solution, the NN can settle on a local minimum with a small value for the loss function.To solve this problem, we propose to incrementally extend the domain on which a solution should be found, by partitioning the training examples and increasing the number of partitions the NN is trained on in each step.If the weights the NN has learned in the previous step are then retained before training for the next step, i.e. the network only has to learn the function on the part of the domain that was incrementally increased, we find that one can achieve numerical stability for an arbitrarily large domain.
We show this mechanism in Fig. 2, where we have partitioned the full domain containing 100 training examples into three regions each of size 1.The network structure again consists of a single hidden layer of 10 units with sigmoid activation functions, and with two units in the final layer, since there are two coupled equations.The upper panel shows the solutions for φ 1 and φ 2 for each iterative step.While the first iteration only allows a solution to be found on a smaller domain, i.e. here from 0 to 1, subsequent steps, and in particular the third step, allows an accurate solution to be found over the entire domain.Again, the differential F proves to be a good indicator of whether the calculated solution is satisfying the differential equation over the entire domain, see the lower panel of Fig. 2.

Partial Differential Equation Example
While we will not study partial differential equations (PDEs) in the later physics examples of calculating phase transitions, we showcase here the flexibility of our NN method.With the same network architecture as used for the solution of the ordinary differential equations (except for an extra input unit for each additional variable), we can apply our approach to the solution of partial differential equations.The precise solution of such equations is a widespread problem in physics, e.g. in mechanics, thermodynamics and quantum field theory.As an example, we choose the second order partial differential equation,  8), with boundary conditions as given in Eq. ( 9), over the domain (x, y) ∈ with boundary conditions, for which an exact analytic solution is known.In Fig. 3 we show the difference between the numerical solution as predicted by the NN and the analytic solution over the domain (x, y) ∈ [0, 1] × [0, 1].The 100 training examples were chosen from an evenly-spaced 10 × 10 grid.As the value of φ(x, y) is of O(1) for most of the domain, the relative and absolute accuracies are similar, so we only show here the absolute accuracy.Across the entire domain, we find a numerical solution with very good absolute and relative accuracy for this second order partial differential equation.However, with a deeper NN, e.g. a second layer with 10 tanh units, we find that the accuracy improves by an order of magnitude further.Deeper and wider networks result in even better accuracies.

CALCULATION OF PHASE TRANSITIONS DURING THE EARLY UNIVERSE
Electroweak baryogenesis is a candidate for solving the baryon asymmetry puzzle, the observed abundance of matter over antimatter in the Universe [22,37].The need for a dynamical generation of baryon asymmetry is dictated by inflation-the entropy production occurring in the reheating phase of inflation would have washed out any asymmetry already present at the beginning of the Universe's evolution [38].A model of baryogenesis was proposed by A. Sakharov in 1967 [21], and must now be accommodated by any fundamental theory capable of addressing the baryon asymmetry problem.This is commonly translated into three necessary conditions: (i) Baryon number violation; (ii) C-and CP-violation; (iii) Loss of thermal equilibrium.While the first condition can be satisfied in the SM, the second and third conditions require it to be extended [39][40][41].Departure from thermal equilibrium can be obtained during a strong first-order phase transition, which is usually accompanied by a sudden change of symmetry [42].Within the SM, this could have occurred during electroweak symmetry breaking when the Universe had the temperature T ∼ 100 GeV [43,44].In order to assess whether this might have been the case, it is crucial to discuss the conditions for scalar field phase transitions at finite temperature.
Quantum fluctuations allow the transition between two vacua of the potential V ( φ). 2 When these are not degenerate, the configuration which corresponds to a local minimum, the false vacuum φ F , becomes unstable under barrier penetration, and can decay into the true vacuum φ T of the potential.The tunnelling process converts a homogeneous region of false vacuum into one of approximate true vacuum-a bubble.Far from this region the false vacuum persists undisturbed [45].The Euclidean action for this process reads, The description of the tunnelling action at finite temperatures follows from the equivalence between the quantum statistics of bosons (fermions) at T = 0 and Euclidean quantum field theory, periodic (anti-periodic) in the Euclidean time τ with period T −1 .In the calculation of S 4 ( φ), the integration over τ is replaced by multiplication by T −1 [46], leaving the three-dimensional Euclidean action, with S 4 ( φ) = T −1 S 3 ( φ).Suggested by the symmetry of the physical problem, we assume φ( x) to be invariant under three-dimensional Euclidean rotations (see Ref. [47] for a rigorous demonstration in the case of one single scalar field), and define ρ = √ x 2 .The bubble configuration φ b (ρ) is the solution to the Euler-Lagrange equation of motion, where the gradient of the potential is with respect to the field φ.The boundary conditions are, The solution thus minimises the action.The probability per unit time and unit volume for the metastable vacuum to decay is given by, This is maximised by the bounce, where S 3 ( φ F ) is the action evaluated at the stationary configuration φ F .A complete expression for the factor A in Eq. ( 14) would require complex computations of differential operator determinants, for which we refer the reader to Ref. [38].An estimate can be obtained from dimensional analysis, which gives A ∼ T 4 [48].Dedicated methods for calculating the nucleation rate, by finding a solution for the bubble profile φ to the non-linear coupled differential equations of Eq. ( 12) exist, and have been implemented in publicly available codes, e.g.CosmoTransitions [49] and BubbleProfiler [50].For the single-field case, both CosmoTransitions and BubbleProfiler use variants of the overshooting and undershooting method.In the multiple field case, BubbleProfiler applies the Newton-Kantorovich method [51], as described in [52].CosmoTransitions instead uses a method that splits the equation of motion into a parallel and perpendicular component along a test path through field space.Then the path is varied until a configuration is found that solves simultaneously both directions of the equations of motion.A further code to calculate the tunnelling rates is given in Ref. [53].An approach using neural networks to directly learn bounce actions from potentials was described in Ref. [54].Recently, a novel approximative approach for single [55] and multiple fields [56] was proposed.Older numerical approaches to calculating bubble profiles and tunnelling rates include [57][58][59].

Phase Transition with a Single Scalar Field
As a first application of our method to the computation of cosmological phase transitions, we consider the case of a single scalar field.Eq. ( 12) then has a straightforward classical analogy-it describes the motion of a particle with coordinate φ(ρ) subject to the inverted potential −V (φ) and to a peculiar looking damping force which decreases with time.The problem reduces to finding the initial position φ 0 , in the vicinity of φ T , such that the particle stops at φ F as ρ → ∞.  18), with λ = α = 1, for the thick-wall (blue solid) and the thin-wall (red dashed) cases.For the latter, the position of the global minimum is also marked by the black dot for clarity.
Existence of a solution was proven in Ref. [45].Starting too close or too far from φ T would both lead to missing the final configuration φ F , due to overshooting and undershooting respectively.Continuity of φ(ρ) thus implies that there must exist an intermediate initial position φ 0 which solves the boundary conditions in Eq. ( 13).The solution presents two limiting profiles, determined by the ratio of ∆ ≡ V (φ F )−V (φ T ) to the height of the potential barrier V (φ bar ).If this ratio is 1, which corresponds to the thick-wall case, the particle will overshoot unless its initial energy is similar to V (φ F ). Conversely, if this ratio is small, corresponding to the thin-wall case, in order to avoid undershooting, the particle must wait close to φ T until the time ρ ≈ R, when the damping force has become negligible.The value of R can be determined exactly in the thin-wall limit [45] using, where the surface tension σ is given by, We test our method on the potential [50], and set λ = a = 1.Two distinct and non-degenerate minima exist for 0 < 0.3, with the upper bound representing the thick-wall limit, and smaller values of representing progressively thinner cases.A plot of the potential is shown in Fig. 4, for the values = 0.01 and  18) for one scalar field, as obtained by our NN method, BubbleProfiler and CosmoTransitions.The middle panel displays the numerical difference between the NN predicted solution and the solutions from the other two codes.The lower panel shows the differential contribution F to the loss.= 0.3 which we consider as our thin-wall and thick-wall cases respectively.
For the boundary conditions in Eq. ( 13), it is clearly not possible to implement an infinite domain for the training of a neural network, and the divergence in the second term of Eq. ( 12) prevents the equation from being evaluated at ρ = 0. Therefore, a training domain ρ ∈ [ρ min , ρ max ] must be chosen.Since the solution approaches the boundaries exponentially, it can be safely assumed that the numerical uncertainties induced by these choices can be neglected, provided that ρ min is sufficiently small and ρ max is sufficiently large.To help in choosing this domain, the identification of in Eq. ( 18) with ∆ in Eq. ( 16) can be made, and the bubble radius R calculated.We then use ρ max = 5R for the thick-wall case, and ρ max = 2R for the thin-wall case (since Eq.( 16) underestimates the true radius for thick-wall cases).Furthermore, we use ρ min = 0.01 for both cases.Although these choices may seem arbitrary, we find that the solution converges provided that the transition point is contained well inside the domain, and the result remains stable even if larger domains are used.The boundary conditions then read, FIG.6: The panel shows the bubble profile for the thin-wall potential ( = 0.01) in Eq. ( 18) for one scalar field, as obtained by our NN method, BubbleProfiler and CosmoTransitions.The middle panel displays the numerical difference between the NN predicted solution and the solutions from the other two codes.The lower panel shows the differential contribution F to the loss.The dashed vertical line shows the analytic location of the bubble radius.
Our NN method can then be applied to find the bubble profile by solving the Euler-Lagrange equation in Eq. (12).We discretise the domain into 500 training points, and choose a network with a single hidden layer.For the thick-wall case, we use 10 hidden units with a sigmoid activation function, as was used in earlier examples, but for the thin-wall case, we find a single tanh unit is sufficient to achieve very good performance, since the solution itself closely resembles a tanh function.To prevent the network from finding the trivial solution where the field remains in the false vacuum forever, we first train the network with the boundary condition at ρ min modified to φ(ρ min ) = φ T so that the network finds a solution in the vicinity of the correct solution, since the starting point is close to the true vacuum, before training again with the correct boundary conditions.We use this two-step training for all phase transition calculations.
Our results for the thick-wall and thin-wall cases are shown in Figs. 5 and 6 respectively, together with the CosmoTransitions and BubbleProfiler solutions.
While all three methods agree very well for the thick-wall case, CosmoTransitions is off compared to BubbleProfiler and the NN approach in the thin-wall case.The dashed vertical line indicates where the bubble radius should be according to Eq. 16.Both, BubbleProfiler and NN find a solution that matches the analytic calculation for the bubble radius.CosmoTransitions instead finds a solution with a smaller bubble radius, and therefore a smaller action and a larger tunnelling rate.
For thin-wall cases, numerical stability is difficult to achieve.It is possible for an approximate solution to be found, which transitions at a much earlier ρ than it should, since a translated solution also approximately solves the differential equation [53].For our method, F can be monitored during the course of the training.During the early stages of the training where the solution does not yet have the correct transition point, then F will be sharply distributed in the region of the incorrect transition.As the training proceeds and the solution converges, the function will flatten out until an accurate solution is found across the entire domain.
We noted that we achieved very good stability for our thin-wall case using a single tanh function.We also explored the idea of using an adaptive distribution of training examples, such that more examples are distributed close to the region where the transition of the NN solution happens, and this distribution is then modified over the course of the training.A larger contribution to the loss in this region will be amplified by having more training examples, which can speed up learning.We found the results can be improved by using this procedure, and this is an idea which could be investigated further in future work.

Phase Transition with Two Scalar Fields
To investigate how well the NN approach can solve the differential equation of Eq. ( 12) for multiple fields, we consider a potential for two scalar fields [49], which has a local minimum at φ 1 = φ 2 = 0 and a global minimum near φ 1 ≈ φ 2 ≈ 1.We focus again on the thickand thin-wall cases, setting δ = 0.4 for the former and δ = 0.02 for the latter.For the thick-wall potential, we solve the coupled equations in Eq. ( 12) with the boundary conditions, space, are shown in Figs.7 and 8 respectively.Once more, BubbleProfiler and the NN predictions agree very well, both for the one-dimensional profiles for φ 1 and φ 2 , as well as for the path in the (φ 1 , φ 2 ) plane.CosmoTransitions shows a slightly different shape for the solutions of φ 1 (ρ) and even more so for φ 2 (ρ), resulting in a slightly modified escape path in Fig. 8.We note that our NN solution has found initial positions for the fields which agree with those from BubbleProfiler.In thick-wall cases, these can differ significantly from the true vacuum φ T , so these values have not been used as an input to the training and the network has converged to them itself.
For the thin-wall potential we find that the performance can be significantly improved if a deeper network is used.BubbleProfiler instead does not find a solution at all, while the NN agrees very well with the path found by CosmoTransitions.Since there is not a solution from all three codes, we do not show the plot here.

Singlet-Scalar Extended Standard Model with Finite
Temperature Contributions As a final example, we study a scenario of phenomenological interest, namely the extension of the SM Higgs sector by a single scalar field. 3Despite its simplicity, the singlet-scalar extended Standard Model (SSM) could potentially provide solutions to puzzles such as the existence of dark matter [62][63][64] and electroweak baryogenesis [65][66][67], where a crucial requirement is a strong electroweak phase transition, as discussed previously.The tree-level potential reads, where h denotes the Higgs field and s the additional Z 2 -symmetric scalar field. 4It is possible to consider a scenario in which the potential barrier separating the electroweak symmetric and the broken phase is generated already at tree level [68].In this scenario, to study the evolution of parameters with T , it is enough to include only the high-temperature expansion of the oneloop thermal potential, which results in thermal corrections to the mass parameters [69],  where, with g and g being the SU (2) L and U (1) Y gauge couplings respectively, and h t is the top Yukawa coupling.
We then consider Eq. ( 12) with the potential, At high temperatures the thermal contribution in Eq. ( 23) dominates and the global minimum is the Z 2 and electroweak symmetric configuration (h = 0, s = 0).The behaviour as T decreases is determined by the choice of parameters.These are constrained to the parameter region in which the potential develops a strong tree-level barrier at the critical temperature T C [68].In particular at T > T C after Z 2 -symmetry breaking, s acquires a non-zero vacuum expectation value, s = w, along the h = 0 direction.This configuration constitutes a global minimum for the potential.At T = T C a second degenerate minimum appears at the electroweak symmetry breaking phase h = v and at the restored Z 2 -symmetric vacuum, s = 0. Finally at T < T C the electroweak minimum (v, 0) represents the only energetically favourable configuration.The nucleation temperature at which the phase transition from φ F = (0, w) to φ T = (v, 0) occurs is found from the requirement S 3 (T N )/T N 140 [39,70].
As an example parameter configuration, we consider T C = 110 GeV, λ m = 1.5 and λ s = 0.65, as used in Ref. [50], and a temperature of T = 85 GeV, which is the nucleation temperature that BubbleProfiler finds.We thus solve the equations in Eq. ( 12) with the boundary conditions, We use a neural network with 10 units in a single hidden layer with a sigmoid activation function, on a training domain of ρ ∈ [0.01, 50] with 500 training points.To avoid large numerical values in the loss function, we scale all mass parameters in the potential by the electroweak symmetry breaking vacuum expectation value at zero temperature, v EW .Our result, along with the comparison to CosmoTransitions and BubbleProfiler, is shown in Fig. 9.We find very good agreement between all three methods to calculate the bubble profiles h(ρ) and s(ρ), and the small values of F across the domain show that good convergence has been achieved.

CONCLUSIONS
By building on the capabilities of an artificial neural network in solving optimisation problems, we have proposed a novel way to finding solutions to differential equations.
Our method extends on existing approaches on several accounts: (i) We avoid trial functions by including the boundary conditions directly into the loss function; (ii) The differential shape of F is an excellent indicator of whether a good solution to F has been found over the entire domain; (iii) In regions of numerical stability we propose to increase the domain iteratively to find stable solutions over arbitrarily large domains; (iv) For solutions that vary quickly over a small part of the domain, we find can be numerically beneficial to self-adaptively distribute more anchors in such regions.
We applied this approach to finding fully differentiable solutions to ordinary, coupled and partial differential equations, for which analytical solutions are known.Various network architectures have been studied, and even relatively small networks showed a very good performance.
To show how this method can be applied to a task of direct phenomenological interest, we used it to calculate the tunnelling profiles of electroweak phase transitions and compared them to those obtained by CosmoTransitions and BubbleProfiler.We find an optimised neural network to be very flexible, and finds solutions for all the examples tested with an accuracy that is competitive with the dedicated programs for calculating the bubble profiles.However, further work, e.g. in developing an approach to choosing the domain sizes for phase transitions in a more robust way, would be required to develop a fully automated tool using this approach.
As this method could be straightforwardly extended beyond the calculation of differential equations we envision it to be applicable to a wide range of problems in perturbative and non-perturbative quantum field theory.

3 First
FIG.1:The upper panel shows the solutions to the first and second order ODEs of Eq. (4) and Eq.(5), with boundary conditions as outlined in the text.The middle panel shows the numerical difference between the analytic solution and the NN predicted solution for both cases.The lower panel shows the differential contribution F to the loss across the entire domain.

2 φ1 1 φ1
FIG.2:The upper panel shows the solutions for the functions φ1 and φ2 to the coupled differential equation of Eq. (6).The middle panel displays the numerical difference between the analytic solution and the NN predicted solution for φ1.The lower panel shows the differential contribution F to the loss across the entire domain, from the equation for φ1.The three NN curves in each panel correspond to the first, second and third iteration step in the training of the network, with iterative increase of the training domain, as described in the text.

FIG. 4 :
FIG.4: Plot of the potential in Eq. (18), with λ = α = 1, for the thick-wall (blue solid) and the thin-wall (red dashed) cases.For the latter, the position of the global minimum is also marked by the black dot for clarity.

FIG. 5 :
FIG.5:The upper panel shows the bubble profile for the thick-wall potential ( = 0.3) in Eq. (18) for one scalar field, as obtained by our NN method, BubbleProfiler and CosmoTransitions.The middle panel displays the numerical difference between the NN predicted solution and the solutions from the other two codes.The lower panel shows the differential contribution F to the loss.

3 φ1
FIG.7:The upper panel shows the bubble profiles for the thick-wall potential in Eq. (20) for two scalar fields, as obtained by our NN method, BubbleProfiler and CosmoTransitions.The middle panel displays the numerical difference between the NN predicted solutions and the solutions from the other two codes.The lower panel shows the differential contribution F to the loss from φ1 and φ2.

FIG. 8 :
FIG. 8: Calculated solutions for the tunnelling path for NN, BubbleProfiler and CosmoTransitions.The paths range from the local minimum to the exit point of the tunnelling barrier.Also shown are the contours of the potential, where the global minimum is denoted by the black dot and the local minimum by the black cross.

4 h
FIG.9:The upper panel shows the bubble profiles for the singlet-scalar extended Standard Model potential in Eq. (26), as obtained by our NN method, BubbleProfiler and CosmoTransitions.The middle panel displays the numerical difference between the NN predicted solutions and the solutions from the other two codes.The lower panel shows the differential contribution F to the loss from h and s.
Instead, we identify the trial solution with the network output, φm ( x) ≡ N m ( x, {w, b}), and include the boundary conditions as extra terms in the loss function.If the domain is discretised into a finite number of training points x i , then approximations to the solutions, φm ( x), can be obtained by finding the set of weights and biases, {w, b}, such that the the neural network loss function is minimised on the training points.For i max training examples, the full loss function that we use is,