Convolutional neural networks for large-scale dynamical modeling of itinerant magnets

Complex spin textures in itinerant electron magnets hold promises for next-generation memory and information technology. The long-ranged and often frustrated electron-mediated spin interactions in these materials give rise to intriguing localized spin structures such as skyrmions. Yet, simulations of magnetization dynamics for such itinerant magnets are computationally difficult due to the need for repeated solutions to the electronic structure problems. We present a convolutional neural network (CNN) model to accurately and efficiently predict the electron-induced magnetic torques acting on local spins. Importantly, as the convolutional operations with a fixed kernel (receptive field) size naturally take advantage of the locality principle for many-electron systems, CNN offers a scalable machine learning approach to spin dynamics. We apply our approach to enable large-scale dynamical simulations of skyrmion phases in itinerant spin systems. By incorporating the CNN model into Landau-Lifshitz-Gilbert dynamics, our simulations successfully reproduce the relaxation process of the skyrmion phase and stabilize a skyrmion lattice in larger systems. The CNN model also allows us to compute the effective receptive fields, thus providing a systematic and unbiased method for determining the locality of the original electron models.


I. INTRODUCTION
Itinerant frustrated magnets with electron-mediated spin-spin interactions often exhibit complex non-collinear or non-coplanar spin textures.Of particular interest are particle-like objects such as magnetic vortices and skyrmions which are not only of fundamental interest in magnetism but also have important technological implications in the emerging field of spintronics [1][2][3][4][5][6][7].These nanometer-sized localized spin textures are characterized by nontrivial topological invariants and are rather stable objects with long lifetimes.In itinerant electron magnets, skyrmions can be moved, created, and annihilated not only by magnetic fields but also by electrical currents thanks to electron-spin interactions.The presence of such complex textures could also give rise to intriguing electronic and transport properties, such as the topological Hall effects and topological Nernst effects [7][8][9][10], due to a nontrivial Berry phase acquired by electrons when traversing over closed loops of non-coplanar spins [11].
Dynamical modeling of complex textures in itinerant spin systems, however, is a computationally challenging task.While magnetic moments in most metallic skyrmion materials can be well approximated as classical spin vectors, the local effective magnetic fields, analogous to forces in molecular dynamics, originate from exchange interactions with itinerant electrons and must be computed quantum mechanically.Dynamics simulations of such itinerant magnets thus require solving an electronic structure problem associated with the instantaneous spin configuration at every time step.Repeated quantum calculations would be prohibitively expensive for large-scale simulations.Consequently, empirical classical spin Hamiltonians, from which the local fields can be explicitly calculated, are often employed in large-scale dynamical simulations of skyrmion magnets [12,13].Yet, such classical spin models often cannot capture the intricate long-range spin-spin interactions mediated by electrons.
The computational complexity of the above quantum approaches to spin dynamics is similar to the so-called quantum or ab initio molecular dynamics (MD) methods.Contrary to classical MD methods that are based on empirical force fields, the atomic forces in quantum MD are computed by integrating out electrons on-thefly as the atomic trajectories are generated [14].Various many-body methods, notably the density functional theory, have been used for the force calculation of quantum MD.However, the computational cost of repeated electronic structure solutions significantly restricts the accessible scales of atomic simulations.To overcome this computational difficulty, machine learning (ML) methods have been exploited to develop force-field models by accurately emulating the time-consuming many-body calculations, thus enabling large-scale MD simulations with the desired quantum accuracy.
Crucial to the remarkable scalability of ML-based force-field models is the divide-and-conquer approach proposed in the pioneering works of Behler and Parrinello [15], and Bartók et al. [16].In this approach, the total energy of the system is partitioned into local contributions E = i ϵ i , where ϵ i is called the atomic energy and only depends on the local environment of the i-th atom [15,16].The atomic forces are then obtained from the derivatives of the predicted energy: , where r i is the atomic position vector.Crucially, the complicated dependence of atomic energy ϵ i on its local neighborhood is approximated by the ML model, which arXiv:2306.11833v1 [cond-mat.str-el]20 Jun 2023 is trained on the condition that both the predicted individual forces F i as well as the total energy E agree with the quantum calculations [15][16][17][18][19][20][21][22][23][24][25][26].It is worth noting that physically the principle of locality, or the so-called nearsightedness of electronic matters, lies at the heart of this approach [27,28].
The tremendous success of ML methods in quantum MD simulations has spurred similar approaches to multiscale dynamical modeling of other functional electronic systems in condensed matter physics [29][30][31][32][33][34][35].In particular, the Behler-Parrinello (BP) ML scheme [15,16] was generalized to build effective magnetic energy or torquefield models with the accuracy of quantum calculations for itinerant electron magnets [33,34,36,37].Notably, large-scale dynamical simulations enabled by such ML models uncovered intriguing phase separation dynamics that results from the nontrivial interplay between electrons and local spins.While the conventional BP scheme can only represent conservative forces, a generalized potential theory for the Landau-Lifshitz equation allows one to extend the BP scheme to describe non-conserved spin torques that are crucial to the dynamical modeling of out-of-equilibrium itinerant spin systems [35].
In this paper, we present an ML torque model for itinerant magnets based on convolutional neural networks (CNN).CNN is a class of neural networks that can be characterized by its local connectivity, implemented via finite-sized convolution kernels.Importantly, the convolution operation with a finite-sized kernel naturally incorporates the locality principle into the ML structure, thus offering an efficient implementation of the ML torque model that can be straightforwardly scaled to larger systems.Our CNN model is designed to directly predict the vector torque field at every site without the need for the introduction of local energies as in the BP scheme.Data augmentation techniques are employed to incorporate the spin-rotational symmetry and the lattice symmetry into the CNN spin-torque model.We demonstrate our approach on an itinerant spin model which exhibits a skyrmion crystal phase at an intermediate magnetic field.We show that dynamical simulations with magnetic torques computed from the trained CNN model faithfully reproduce the relaxation process of the itinerant spin systems.Moreover, the CNN model, while trained by datasets from small systems, is capable of stabilizing a skyrmion lattice on larger systems, thus demonstrating the transferability and scalability of our ML approach.
The rest of the paper is organized as follows.In Section II, we discuss the methods for simulating the spin dynamics of itinerant electron magnets.A triangularlattice s-d model, a well-studied itinerant spin system, is used as a concrete example to highlight the complexity of the dynamical simulations.We also briefly review BPtype ML approaches, where a neural network is trained to approximate a local energy function.Section III presents the CNN structure used for the prediction of spin-torque.Details of the data augmentation for incorporating symmetries and how the ML model can be scaled to larger systems are also discussed.Using the s-d model as an example, a benchmark of the CNN models and simulation results based on the trained models are presented in Section IV.We also ascertain the scalability and symmetry of the proposed CNN method, as well as its compliance with the locality principle.Finally, we summarize our work and discuss future directions in Sec.V.

II. MAGNETIZATION DYNAMICS OF THE ITINERANT MAGNETS
The magnetization dynamics in spin systems is governed by the Landau-Lifshitz-Gilbert (LLG) equation [38] where T i is the magnetic torque defined as Here γ is the gyromagnetic ratio, and H i is an effective exchange field acting on spin-i, α is the damping coefficient, and τ i (t) = S i × η i (t) is a fluctuating torque generated by a random local field η i of zero mean.The stochastic field η i is assumed to be a Gaussian random variable with the variance determined from α and temperature T through the fluctuation-dissipation theorem.
The LLG simulations are widely used to study dynamical phenomena in a wide range of magnetic systems, including spin waves in unusual magnetic phases and dynamical behaviors of skyrmions and other spin-textures.
For adiabatic spin dynamics, the local exchange field is given by the derivative of the system energy E = E(S i ): For magnetic insulators, interactions between spins are often short-ranged.The resultant magnetic energy has the form of bilinear interactions between a few nearestneighbor spins on the lattice, e.g.
, where J ij denotes the isotropic Heisenberg exchange interaction and D ij represents the anisotropic exchange, also known as the Dzyaloshinskii-Moriya interaction [12,13].The exchange field of such models is explicitly given by , where the summation is restricted to a few nearest neighbors, and can be very efficiently computed for large-scale LLG simulations.
On the other hand, the exchange fields in a metallic magnet come from interactions between local spins and itinerant electrons.Here we consider spin dynamics in the adiabatic approximation, which is analogous to the Born-Oppenheimer approximation in quantum molecular dynamics [14].In the adiabatic limit, electron relaxation is assumed to be much faster than the time scale of local magnetic moments.As a result, the magnetic energy E in Eq. ( 3) can be obtained by freezing the spin configuration and integrating out the electrons.The resultant spindependent energy function, E = E(S i ), can be viewed as a potential energy surface (PES) in the high-dimensional spin space, similar to the PES in Born-Oppenheimer MD simulations.In practice, the calculation of this magnetic PES requires solving the electron structure problem that depends on the instantaneous spin structure {S i (t)}.
For concreteness, here we consider a generic singleband s-d model for such itinerant magnets.The sd model describes the interaction between itinerant sband electrons and magnetic moments S i of localized delectrons.Its Hamiltonian reads where c † iα /c i,α are creation/annihilation operators of an electron with spin α =↑, ↓ at site i, t ij is the electron hopping constant between a pair of sites (i, j), J denotes the strength of local Hund's rule coupling between electron spin and magnetic moment S i of localized d-electrons.For most skyrmion magnets, these local magnetic moments can be well approximated as classical spins of fixed length For small Hund's coupling J ≪ t ij , the effective energy of spins can be obtained by integrating out electrons via a second-order perturbation calculation, giving rise to a long-ranged oscillatory interaction, similar to the so-called Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction [39][40][41].However, for intermediate and large Hund's coupling, the effective energy to be used for the force calculation in Eq. (3) has to be obtained by integrating out electrons on the fly: where ρ = exp(−H/k B T ) is the density matrix of the equilibrium electron liquid within the adiabatic approximation.The calculation of the density matrix, in the absence of electron-electron interaction, amounts to solving a disordered tight-binding Hamiltonian for a given spin configuration.The standard method for solving tightbinding models is based on exact diagonalization, whose complexity scales cubically with the system size.As a result, for large-scale LLG simulations of the s-d model, repeated ED calculations of the electron density matrix can be overwhelmingly time-consuming.
As discussed in Sec.I, the BP scheme has been generalized to develop ML-based models for the effective spin energy E({S i }) of itinerant magnets [33][34][35][36][37].In this approach, the total energy is partitioned into local contributions where the energy ϵ i = ε(C i ) is associated with the i-th lattice site and is assumed to depend only on spin configuration C i = {S j | ∥r j − r i ∥ < r c } in its neighborhood.This local energy function ε(C i ) can be viewed as the building block of the magnetic PES.Importantly, the complicated dependence of the PES on the neighborhood spins is to be approximated by fully connected neural networks [33][34][35].To preserve the SO(3) spin rotation symmetry, the inner product between spin pairs b jk = S j • S k and scalar product between spin triplets χ jkl = S j • S k × S l within the neighborhood are used as building blocks to construct feature variables that are input to the neural network.Finally, exchange fields H i acting on spins are obtained by applying automatic differentiation to the ML energy model.

III. CNN SPIN TORQUE MODEL
The BP-type schemes described here essentially provide an energy-based ML model for force field calculations.A crucial step is the partitioning of the total energy into local contributions ϵ i , which cannot be directly computed from electronic structure methods that are used to generate the training dataset.As a result, the loss function L cannot be directly determined from the predicted energies ϵ i .Instead, it is constructed from the, e.g., mean square error (MSE) or "forces", or in our case, the spin torque fields, and only implicitly depends on the predicted energy through automatic differentiation.However, the uncertainties due to the introduction of such intermediate local energies often complicate the training of BP-type models.While one advantage of the BP-type scheme is the explicit inclusion of the physical constraint of conservative forces, such energy-based ML approaches, however, are also restricted to the representation of only conservative forces.In this Section, we present an alternative ML approach that directly predicts the vector forces without going through intermediate energy.

A. Convolutional Neural Networks
The fact that spins in metallic magnets are defined on well-known lattices suggests that spin configurations can be treated as generalized "images," which can then be processed using powerful image-processing techniques developed in recent years, such as CNNs.Below, we present a CNN model for the direct prediction of torques T i that drive the spin dynamics.As illustrated in Figure 1, the proposed network takes the spin configuration {S i } on the lattice as input and returns the torques {T i } that drive the spin dynamics as output.The model is comprised of multiple convolution layers f m with associated activation (nonlinearity) layers σ m to model the complex nonlinear relationship between {S i } and {T i } as a composition of such layers: where L is the number of layers or the depth of the CNN model.
Given an input vector field V ∈ C ∞ (R2 , R d ), each convolution layer f m maps the vector field onto an output vector field W ∈ C ∞ (R 2 , R k ) by convolving a kernel ten-< l a t e x i t s h a 1 _ b a s e 6 4 = " Y w r q 7 g l g K g 6 7 7 7 a y s r q 1 v b O a 2 8 t s 7 u 3      5 2 P e W n D y m U P 4 A + f z B 7 3 H j q g = < / l a t e x i t > T x i < l a t e x i t s h a 1 _ b a s e 6 4 = " t k n Q 9 v e y w 5 I h 0 T q D c V B 6 8   sor field h m (X) := h(X; θ m ) with trainable parameters θ m , via the convolution operation: Each vector element of the vector field W then undergoes the activation function σ m : R → R to produce the output vector field A ∈ C ∞ (R 2 , R k ), called activation maps.
A variety of activation functions can be used in CNNs.
In the current work, we use the rectified linear unit, or ReLU [42] as an activation function: for m = 1, . . ., L − 1.Note that the final layer f L has no activation function associated with it, or technically, Typically in CNNs, the support of a kernel supp(h m ), i.e., the region where h m has nonzero values (also known as the receptive field of the kernel), is limited to a small region (e.g., 5 × 5 lattice sites) such that the activation response W (r), thereby A(r), at position r is limited to the patterns of V only within the close proximity of r.It is worth noting that the physical justification of employing such finite-size kernels is the principle of locality: viz., local physical quantities, such as local spin torque T i = T(r i ), are predominately determined by the local environment of site-i: where C i is the magnetic environment in the vincinity of site-i, and the vector function T (•) is to be modeled by the CNN.The range of the neighborhood C i is determined by the sizes of kernels and the number of convolution layers.
Note that Eqs. ( 7) & ( 8) imply that the output activation A(r) at position r will be of a large, positive magnitude, only if the input vector field V (r) is closely correlated to the (shifted) kernel h m (q − r).Therefore, the goal of training the CNN is to find the unknown kernel parameters θ m=1,...,L that determines the function shapes of {h m }, in order to produce the adequate activation values such that the final output of the model f CNN ({S i }) can reasonably approximate the ground truth spin torque {T i } in the training data.
Meanwhile, the composition of convolution layers enables hierarchical modeling of the spin-torque relationship.That is, while an individual kernel limited to a small region may only represent rather simplistic patterns (e.g., small blobs), the composition of such kernels across layers alongside the nonlinear ReLU activation can produce fairly complex, nonlinear patterns.Furthermore, the composition of convolution layers (σ n •f n )• (σ m • f m ) in effect produces a larger receptive field area, equal to the Minkowski sum 1

of the receptive fields of the individual layers supp(h
Therefore, stacked convolution layers produce a natural hierarchy, in which earlier layers represent local, primitive patterns in a small proximity while latter (deeper) layers represent more global, sophisticated patterns in a relatively larger periphery.
Finally, a purely convolutional CNN, without any conventional fully connected (dense) layers, can restrict the overall receptive field size of the entire model supp(h L ⊗ • • • ⊗ h 1 ) to a predetermined lattice size, presenting a distinct advantage of built-in locality.Further, with a purely convolutional design, since the sizes of kernels in a CNN are fixed for a given itinerant model, the successfully trained CNN model can then be used in much larger lattice systems without the need to rebuild or retrain a new neural network.The CNN structure thus provides a natural approach to implementing scalable ML models based on the locality principle.

B. Model Architecture
Figure 1 shows a schematic diagram of our CNN architecture.The input to the CNN is the spin vector field {S i } transformed from triangular lattice to square ("flattening") as most convolution operation expects a square input.We employed four ResNet blocks, inspired by He et al. [43], as our backbone, which processes the input into an activation map of 512 features.These features then undergo two additional convolution layers, which in the end produce the torques as the output of the network.In our model, these torque vector outputs are obtained in a normalized range of values with a mean magnitude of 1.Such normalized outputs are then scaled by the mean magnitude of the torque vectors in the training data set.Finally, the predicted torques on the square grid are "unflattened" onto the original triangular lattice.
Compared to fully connected (dense) layers, in which each neuron aggregates values across the entire domain into a single scalar value via the weighted sum, convolution layers preserve the spatial structure of the input domain.Therefore, with the proper boundary condition (e.g., 'padding' in the machine learning jargon), the output lattice is guaranteed to have the same size and resolution as the input lattice.Therefore, a CNN comprised of purely convolution layers, without any fully connected layers, can be scaled to an arbitrary lattice of practically any size, as long as the lattice element has locally the same geometric and topological structures.
Meanwhile, the architecture of the ResNet block in the backbone is described in Figure 2. Similar to the original ResNet, the input goes through two separate pathways.One pathway (right path in Figure 2) is comprised of two convolution layers with the ReLU activation [42], stacked on top of each other, in order to develop a feature vector characterizing patterns of the input vector field at each local neighborhood on the pixel grid.The other pathway (left path in Figure 2) can be either the skip connection, in which the input values are directly copied without any transformation, or a 1×1 convolution, in which the input feature vector at each grid location undergoes a dimensionality reduction.The outputs from the two pathways are then added together to produce the overall output of the ResNet block.Here, note that we do not employ the batch normalization technique, which is a technique used in the original ResNet model to avoid the vanishing gradient problem.Empirically, we found that batch nor-< l a t e x i t s h a 1 _ b a s e 6 4 = " A g e 4 7 F z R e 6 q e a J p h M 8 I j 2 L B X Y L g u y 5 e V z d G W V I Y q k s i U M W q q / J z I c a z 2 L Q 9 s Z Y z P W 6 9 5 C / M / r p S a q B x k T S W q o I K

t Y T M I K S c = " >
o G e p J A J 0 k M 1 v n e F z q w x x F C t b 0 u C 5 + n s i I 0 L r q Q h t p y B m r J e 9 X P z P 6 6 U m a g Q Z k 0 l q Q N L F o i j l 2 M Q 4 f x w P m Q J q + N Q S Q h W z t 2 I 6 J o p Q Y + M p 2 x C 8 5 Z d X S b t e 8 6 5 q l / f 1 a r N R x F F C p + g M X S A P X a M m u k U t 5 C O K x u g Z v a I 3 R z g v z r v z s W h d c 4 q Z E / Q H z u c P q j O N + A = = < / l a t e x i t > ReLU < l a t e x i t s h a 1 _ b a s e 6 4 = " k G z e Q b R y j T x 9 K e s 7 f 7 7 t Y T M I K S c = " > A H q a L g 0 5 j H q h s S D Z x J 8 A 0 z H L q J A i J C D p 1 w c p P 7 n S d Q m s X y 0 U w T C A Q Z S R Y x S k w u P c C d P 6 h U 3 Z o 7 B 1 4 l X k G q q E B r U P n q D 2 O a C p C G c q J 1 z 3 M T E 2 R E G U Y 5 z M r 9 V E N C 6 I S M o G e p J A J 0 k M 1 v n e F z q w x x F C t b 0 u C 5 + n s i I 0 L r q Q h t p y B m r J e 9 X P z P 6 6 U m a g Q Z k 0 l q Q N L F o i j l 2 M Q 4 f x w P m Q J q + N Q S Q h W z t 2 I 6 J o p Q Y + M p 2 x C 8 5 Z d X S b t e 8 6 5 q l / f 1 a r N R x F F C p + g M X S A P X a M m u k U t 5 C O K x u g Z v a I 3 R z g v z r v z s W h d c 4 q Z E / Q H z u c P q j O N + A = = < / l a t e x i t > malization overly regularized the network causing severe underfitting of the spin torque and overall deteriorating the prediction performance.Moreover, since the input spin vectors are already well normalized to have a length of 1, it is not necessary to employ batch normalization.

C. Training
Our training and testing sets consist of 60 independent spin dynamics simulations, respectively, performed on a 48×48 triangular lattice.The following parameters are used for the s-d Hamiltonian Eq. ( 4): The nearestneighbor hopping was set to t 1 = 1, which also provides the reference unit for energy.A third-neighbor hopping t 3 = −0.85 is included in order to stabilize a triple-Q magnetic order that underlies the skyrmion lattice (SkL) phase [44].The electron-spin coupling constant is set at J = 1.An electron chemical potential µ = −3.5 was used, and an external magnetic field H ext = 0.005 was included to explicitly break the time-reversal symmetry and induce the SkL [44].As discussed in Section II, the exchange fields H i acting on spins are obtained by solving the electron Hamiltonian.Specifically, using Eq. ( 3) and the s-d Hamiltonian Eq. ( 4), the exchange fields are given by where ρ iα,jβ := ⟨c † jβ c iα ⟩ is the electron correlation function, or single-electron density matrix.The kernel poly-nomial method (KPM) [45,46] was used to compute the electron density matrix for generating the training dataset.The KPM is more efficient compared with exact diagonalization, yet is considered numerically exact when a large number of Chebyshev polynomials and random vectors are used.
The time-scale of the precession dynamics of the LLG equation ( 1) is given by t 0 = (γJS) −1 , where γ is the gyromagnetic ratio, J is the electron-spin coupling, and S is the length of the localized magnetic moments.The damping term introduces another time-scale t damping = t 0 /α which characterizes the rate of energy dissipation, where α is a dimensionless coefficient.In the following, the simulation time is measured in terms of t 0 , and a damping coefficient α = 0.05 is used.
The initial conditions of the simulations are divided into two types.The first one is perturbed SkL, where a periodic array of skyrmions is baked into the initial condition but spins had random noise added.The other type is random initialization where the spins are totally randomly generated.For each type of initial condition, a total of 30 simulations were generated, each of them comprised of 5,000 time steps.For a given initial condition, a semi-implicit second-order scheme [47] which preserves the spin length was employed to integrate the LLG equation ( 1) with a time-step ∆t = 0.1.
The spins and their corresponding exchange fields at all lattice sites were collected every 10 other steps in the simulation.We focused on the training of the electroninduced exchange field, so the external constant field of H ext = 0.005 in the z direction was removed.The field H i was then decomposed into components that are parallel and perpendicular to spin components, and only the perpendicular component, which is equivalent to the torque T i , was kept as the parallel component has no effect in the evolution of spin configuration and is around two orders of magnitude larger than the perpendicular component.The perpendicular fields were then normalized to have a mean magnitude of 1 over the entire dataset.Then 70% of the entire dataset was used for the training, while the rest was set aside for validation.The split of the dataset is stratified so the training and testing set has the same proportion of the two types of simulations.
The triangular-lattice s-d Hamiltonian in Eq. ( 4) is invariant under two independent symmetry groups: the SO(3)/SU(2) rotation of spins and the D 6 point group of the triangular lattice.Here the rotation symmetry refers to the global rotation of local magnetic moments S i → R • S i (treated as classical vectors), and a simultaneous unitary transformation of the electron spinor ĉiα → Ûαβ ĉiβ , where R is an orthogonal 3 × 3 matrix and Û = Û (R) is the corresponding 2 × 2 unitary rotation operator.The ML model, corresponding to an effective force-field model by integrating out electrons, needs to preserve the SO(3) rotation symmetry of spin, which means under uniform rotation R of all spins in the neighborhood, the ML predicted spin torques should undergo the same rotation transformation T i → R • T i .On the other hand, under a symmetry operation g of the D 6 point group centered at some lattice point, both spins and torques transform under the D 6 point group as: S i → S j and T i → T j , where the lattice points r j = R(g) • r i , and R(g) denotes the 3 × 3 matrix corresponding to g.
To incorporate both symmetries into the CNN model, we introduced data augmentation during our training phase.Specifically, for each input spin configuration and the corresponding torque field, a random SO(3) rotation field was applied to the spins {S i } and a random D 6 symmetry operation was applied to the lattice points.The same symmetry operations, both for spin-space and realspace lattice, were also applied to the torque fields {T i }.These additional symmetry-generated input/output configurations were included along with the original ones to the dataset for supervised training.We note that, contrary to previous ML models where the symmetry is explicitly included through descriptors, the symmetry of the itinerant electron Hamiltonian is enforced on the ML vector model statistically in this data-driven approach.
Since the torques in the dataset could differ at most by one order of magnitude, we find that the usual mean absolute error or mean square error loss functions do not perform very well.Instead, we adopted a mean percentage absolute loss: where N is the total number of lattice sites within each batch summed across all lattices, Ti is the ground truth field vector at i-th lattice site and T i = (T x i , T y i , T z i ) is the predicted field vector and its three components.
An Adam [48] optimizer with an initial learning rate of 10 −3 was used for the training.The learning rate was later reduced to 10 −6 upon the plateau of loss value in the testing set.We did not use any regularization methods, such as dropout or weight decay, and there is no evidence of overfitting when comparing training and testing set loss values.The model and its training process are implemented in PyTorch [49], and training was performed on one Nvidia A100 for roughly 72 hours.

IV. RESULTS
Here we present benchmarks of the CNN models by comparing spin torque predictions and small-scale dynamical simulations against exact methods.We further demonstrate the restoration and stability of a skyrmion lattice in large-scale LLG simulations, highlighting the scalability and transferability of our ML approach.

A. Benchmark of Spin Torque Prediction
The spin torques T i predicted from the trained CNN model are compared against the ground truth in Figure 3 using configurations from the test dataset.Two types of testing data are used for this benchmark: LLG simulations of an initially perturbed SkL state, and LLG simulations starting from random spins.In both cases, the predicted torque components closely follow the ground truth with roughly equal variance across the entire range.Note the values of torque components in the random spin case span a range nearly twice larger than that of the SkL case.As can be expected, the ML model performs better in the case of the SkL simulations since spin configurations here correspond to a rather small and special set of the whole state space.Yet, a fairly good agreement was obtained even for the testing dataset with completely random initial spins.
We further examine the magnitude of predicted torques versus the ground truth, as well as the angle between the predicted field vector and ground truth vector in Figure 4. Again, an overall satisfactory agreement was obtained, with the majority of the predictions close to or symmetrically distributed around the ground truth value.Note that due to the distortion of the logarithmic function, the same deviation from ground truth at large and small magnitudes will look asymmetric and "biased" towards smaller values.Therefore, two red dotted lines with constant deviation of 10 −2 (outer) and 10 −3 (inner) have been added in Figure 4(a).Even at a large magnitude where the error of the ML model is also the largest, the difference in field vector magnitude is almost guaranteed to be smaller than 10 −2 .At a small magnitude, the difference in field vector magnitude is most likely to be smaller than 4 × 10 −3 and would typically be around 10 −3 .We did not notice any bias in our ML prediction results.The ML-predicted vectors are also very closely aligned with ground truth field vectors.As shown in Figure 4(b), most vectors would have an angle smaller than 10 • , and it is almost impossible to find a predicted vector with a more than 30 degrees angle from its ground truth counterpart.

B. Dynamical Benchmark
In addition to accurate predictions of spin torques, another important benchmark is whether the trained ML model can also faithfully capture the dynamical evolution of the itinerant spin model.To this end, we integrated the trained CNN model into the LLG dynamics simulations and compared the results with LLG simulations based on KPM [45,46].We consider simulations of a thermal quench process where an initially random magnet is suddenly quenched to nearly zero temperature at time t = 0.While our trained CNN model produces fairly accurate spin torques, small prediction errors still persist, as discussed in the previous section.Statistically, these prediction errors are similar to the stochastic noise τ i (t) in the LLG equation ( 1).These site-dependent fluctuating random torques are similar to the thermal forces in Langevin dynamics.Both random forces are physically due to thermal fluctuations through coupling to a thermal bath at a fixed temperature.As a result, while the temperature of the ML-LLG simulations was set to exactly zero, a very low yet nonzero temperature T = 0.001 was introduced in the exact LLG dynamics to mimic the prediction error.
The model parameters of the s-d Hamiltonian are chosen to stabilize a spontaneous SkL ground state.Importantly, the emergence of skyrmion crystal not only breaks the spin-rotation symmetry but also breaks the lattice translational symmetry.The periods of this spatial modulation, i.e., the lattice constant of the skyrmion lattice, are determined by the underlying electron Fermi surface.Indeed, while an SkL state can be intuitively thought of as a periodic array of particle-like spin-textures, physically SkL phases often result from an instability caused by quasi-nesting of the electron Fermi surface that gives rise to a multiple-Q magnetic order [44,50,51].
In our case, the geometry of the Fermi surface at the chemical potential µ = −3.5 allows significant segments to be connected by three wave vectors Q 1 = (π/3a, 0) and Q 2,3 = R ±2π/3 • Q 1 , related to each other by symmetry operations of the D 6 group.Here a is the lattice constant of the underlying triangular lattice.This means that maximum energy gain through electron-spin coupling is realized by spin helical orders with one of the above three wave vectors.Further analysis shows that the electron energy is further lowered by the simultaneous ordering of all three wave vectors, giving rise to an emergent triangular lattice of skyrmions.
The relaxation of the magnet after the thermal quench is dominated by the formation of the triangular SkL.A perfect SkL is distinguished by six Bragg peaks at q = ±Q 1 , ±Q 2 , and ±Q 3 in momentum space.Yet, since the spin interactions are local in nature, the crystallization of skyrmions is inherently an incoherent process.Small crystallites of skyrmions are nucleated randomly separated by large domains of disjointed structures.To quantitatively characterize this crystallization process, we compute the time-dependent spin structure factor, which is defined as the square of the Fourier transform of the spin field where the bracket ⟨• • • ⟩ indicates averaging over thermal ensemble as well as initial conditions.The structure factor is itself the Fourier transform of the spin-spin correlation function in real space and can be directly measured in neutron scattering experiments.The spin structure factors at various times after the quench, obtained from LLG simulations based on both KPM and ML models, are shown in Figure 5. Due to the stochastic nature of such simulations, the results are obtained by averaging 30 independent runs.Overall, the results from LLG simulations with the trained CNN model agree very well with those based on the numerically exact KPM.
Both simulations show that a ring-like structure quickly emerges in the structure factor after the quench.The radius of the ring is close to the length of the three nesting wave vectors Q η , indicating the initial formation of skyrmions.As the system relaxed towards equilibrium, the ring-like structure becomes sharper.Moreover, the spectral weight starts to accumulate at the six spots corresponding to the ±Q η wave vectors.Physically, the emergence of the six broad segments corresponds to the growth of domains of the skyrmion lattice.The size of these intermediate skyrmion crystallites can be inferred from the width of the six spots.However, both simulations found that even at a late stage of the equilibration, the structure factor exhibits only six diffusive peaks at the nesting wave vectors, instead of sharp Bragg peaks as expected for a perfect SkL.The broad peaks at a late stage of the phase ordering thus indicate an arrested growth of SkL domains in real space.An example of the real-space spin configuration at t = 10 4 after the quench is shown in Fig. 6.The snapshot shows rather small triangular clusters of skyrmions coexist with stripe-like structures of different orientations, These stripes or helical spins corresponds to the single-Q magnetic order which are meta-stable states of the s-d model.
This intriguing freezing phenomenon can be partly attributed to the frustrated electron-mediated spin interactions.Another important source is related to the degeneracy between skyrmions of opposite vorticity, or circulation of the in-plane spins.The two opposite circulations correspond to the topological winding number w = ±1 for the skyrmions.As discussed above, the spin-rotation symmetry is decoupled from the lattice in the s-d Hamiltonian (4), which provides a minimum model for centrosymmetric itinerant magnets without spin-orbit coupling.As a result, skyrmions with clockwise circulation is energetically degenerate with counter-clockwise ones.This also means that SkL domains of the two opposite circulations are nucleated with roughly the same probability after the thermal quench.The subsequent annihilation of skyrmions with opposite vorticity thus prohibits the growth of a large coherent SkL.

C. Scalability and Large-scale Simulation
As discussed in Sec.III B, thanks to the locality property and the fixed-size kernels, the CNN model can be directly scaled to larger lattice systems without retraining, thus enabling large-scale dynamical simulations that are beyond conventional approaches.Here we demonstrate the scalability of the CNN spin-torque model by applying it to LLG simulations of large-scale SkL phases.Specifically, we perform LLG simulations of a perturbed SkL state on a 96×96 lattice using a CNN model trained from simulations of a 48 × 48 lattice.As discussed above, the triangular skyrmion lattice, characterized by the three nesting wave vectors, can be viewed as a superposition of three helical spin orders.Explicitly, a perfect SkL can be approximated by the following ansatz [44,51] where ê1,2,3 are three orthogonal unit vectors, Q ηi = Q η • r i , and Q ′ η,i = Q η,i + ϕ are phase factors of the three helical orders, ϕ, A, and M are fitting parameters.To demonstrate that the ML model can indeed stabilize the SkL, which is the ground state of our chosen s-d Hamiltonian, we initialize the system with a perturbed array of skyrmions as shown in Figure 7(a).The randomness in the initial state was introduced by allowing site-dependent parameters ϕ i , A i and m i , which are randomly generated, in the above SkL ansatz (13).Contrary to completely random spins for the initial states in the previous dynamical benchmark, this initial state preserves a coherent structure of skyrmion winding numbers.As these topological numbers have to be conserved, the relaxation of the system is free of random annihilation of skyrmions.As shown in Figure 7, our ML-based LLG simulations indeed find that a nearly perfect SkL is restored and stabilized over a long period of simulation time.
We further investigate the scalability in the time domain by running our ML-based LLG simulation long past the duration of the training simulations.Figure 8 shows a roughly constant structural factor long beyond the duration of simulation snapshots used in training.While we noticed a huge decrease in structural factor between the time of 15,000 and 23,000, the structural factor quickly rebounds to its original stable value (S(q, t) ≈ 305).These temporal fluctuations can be ascribed to the prediction errors of the ML model.Yet, as also discussed above, such errors play a role similar to the stochastic noise in Langevin-type dynamics simulations.Our results thus demonstrate the robustness of SkL under small random perturbations.Importantly, this further bench-mark highlight the scalability of our ML models not only in spatial domains (larger lattices), but also in temporal scales (much longer simulation times) as well.

D. Symmetry Requirements
In order to incorporate the underlying symmetries of a physical system into an ML model, one needs to introduce appropriate biases (prior knowledge) through the statistical learning process.Two of the major approaches to this end are: i) data augmentation based on the symmetry group of the system; ii-a) constructing symmetry-invariant descriptors, or ii-b) constructing equivariant neural network architectures w.r.t. the symmetry group.These two types of approaches correspond to introducing the observational and the inductive biases, respectively, in the context of physics-informed machine learning literature (see, e.g., [52][53][54][55][56][57]). As discussed in Section III C, the local symmetries of our system, i.e., the spin-space and the real-space lattice symmetries a.k.a. the internal (gauge) and the spacetime symmetries [58], consist of G-valued fields over the underlying lattice, where G = SO(3) × D 6 .In the present work, we adopted the data augmentation approach as the mean to enforce the symmetry constraint for the reasons justified as follows.
First, we briefly summarize the theoretical justification of how data augmentation during our training phase is injecting the above-mentioned symmetries into the underlying supervised learning process (see [54,59,60] for details).To avoid cumbersome notation, we denote a pair of spin configurations and its corresponding torque field (S, T) = ({S i } , {T i }) by F. Our training data F 1 , • • • , F n consist of independent identically-distributed (i.i.d.) samples from a probability distribution P over the space of all spin-torque fields.It is of fundamental importance that the probability distribution P remains invariant under the action of each local symmetry g ∈ G , where G denotes the space of all local symmetries of the system2 .Hence, the data augmentation process can be considered as enriching our set of samples from the probability distribution P, where our goal is to learn it, by adding transformed spin-torque fields g • F, g ∈ G. During where L denotes the loss function given by Eq. ( 11), and η is the learning rate.In other words, the augmented SGD can be considered as minimization of the empirical risk associated with the following augmented loss function in which one takes an average along the whole orbit of the group action w.r.t. a probability distribution Q over G.It can be proved that data augmentation based on the underlying symmetry group reduces the variance of general estimators and improves their generalizability [54].
The above theoretical justification can further be validated through empirical data.Figure 9 shows the typical prediction error (blue), the difference between predicted and ground truth torque, and the equivariance error (orange), defined as f θ (g • S) − g • f θ (S).As can be seen in the figure, the equivariance error is smaller than the typical prediction error, indicating that in practice data augmentation employed is capable of preserving the underlying symmetry of the physics system to a satisfactory degree.

E. Locality Principle and Receptive Fields
To attest to the locality principle, we analyze the receptive field of our CNN model in this section.As discussed in Section III A, the receptive field of a convolution layer f m is defined to be the support supp(h m ) of the corresponding convolution kernel h m , i.e., the region where the function values of h m are nonzero tensors.The receptive field of the entire CNN model is computed as the Minkowski sum of the receptive fields of individual convolution layers, or RF = supp(h 1 ) ⊕ • • • ⊕ supp(h L ).For our model, in which there are 10 layers in depth with each layer comprised of 5 × 5 convolution kernels of stride 1, the size of the receptive field is calculated to be 41.This implies that, in principle, the spin directions of the 41neighborhood can influence the prediction of the torque at the lattice position i.
However, the naïve computation of the receptive field size may be misleading because the kernel size of a convolution layer simply just indicates the theoretical maximum of the receptive field.On the other hand, the actual region of nonzero values can be much smaller than the theoretical receptive field size.To this end, we used the approach of Luo et al. [61] to compute the effective receptive field size, in which function values are practically nonzero.Figure 10 shows the result of such a calculation performed on the trained CNN model.The red hexagonal line delineates the region inside the hexagon where function values are practically nonzero and the region outside the hexagon where function values are practically zero.The grayscale values inside the hexagon indicate different levels of influence of neighboring spins in computing the torque vector.As can be seen from the figure, at the lattice location i, which is the center of the red hexagon, the weighting factor is the largest, implying that T i is predominantly determined by S i .The 1-neighborhood N 1 (i), i.e., the immediate neighbors to the lattice location i, also has bright intensity values, implying that the relative configuration of the spin direction S i to its neighboring spins S j at j ∈ N 1 (i) also have a significant influence to the output torque T i .Similarly, it appears that the spin directions of the 3-neighborhood have strong influences on torque prediction, while small influence can be detected all the way to 6-neighborhood.This result is consistent with previous ML spin-torque models based on symmetry-invariant descriptors [33,34], which shows that the spin dynamics of similar s-d models can be nicely captured by BP-type models based on fully connected NN with input from a neighborhood up to r c ∼ 5 lattice constants.Physically, as discussed above, the finite sizes of effective receptive field are due to the locality nature of the spin torques.However, the range of locality can only be indirectly determined from exact calculations.In practice, the cutoff radius is treated as an ad hoc parameter in BP-type ML models, or is determined through trial and error.It is thus worth noting that the CNN model offers a systematic and rigorous method to determine this important physical attribute of electronic models.
FIG. 10.The effective receptive fields (ERFs) of the ML model.Since for each lattice site both the input and output tensors have 3 components, there are in total 9 ERFs corresponding to the 9 partial derivatives of the 3 output torque components with respect to the 3 input spin components.The sum of these absolute values of these derivatives are then presented in this figure, with blacker pixels indicating smaller derivative value.The red line roughly traces the non-zero value regions of the ERF.

V. CONCLUSION AND OUTLOOK
In this paper, we presented a CNN model to predict spin torques directly from input spin configuration for large-scale LLG dynamics simulations of itinerant magnets.Our CNN model is purely convolutional without any fully connected (dense) layers, and thus presents a distinct advantage of built-in locality.Central to each CNN layer is the convolution with a kernel or filter, which can be viewed as a Green's function representing finite responses to a local source.As each kernel is characterized by a finite set of trainable parameters, the CNN model can be used for dynamical simulations on larger system sizes without rebuilding or retraining the neural network.We demonstrated our ML approach on a triangularlattice s-d model which exhibits a skyrmion crystal in its ground state.Using the ML-predicted torques in the LLG dynamics simulations, we showed that the trained CNN model can successfully reproduce the relaxation of the skyrmion phase of the itinerant spin models.We further demonstrated the scalability and transferability of our approach by showing that large-scale LLG simulations based on our CNN model are able to stabilize a perturbed skyrmion lattice and maintain it for a long period of time.
Contrary to the ML force-field models based on the Behler-Parrinello scheme, our CNN model directly pre-dicts torques, which are the spin analogue of atomic forces.In BP-type approaches, ML models, either Gaussian process regression or fully connected neural nets, are built to predict local energy, which cannot be directly compared with exact calculations.The forces are obtained from derivatives of the total energy, which is the sum of all local energies.The introduction of local energy takes advantage of the locality property and also facilitates the incorporation of symmetry into the ML models.Yet, the fact that forces are computed indirectly from derivatives of energy also restricts BP-type models to the representation of conservative forces and quasiequilibrium electron systems.On the other hand, our CNN approach can be used to describe both conservative as well as non-conservative spin torques.This capability is particularly important for ML modeling of out-ofequilibrium driven systems where the electron-mediated torques are non-conservative.A representative example is the spin transfer torque which plays an important role in spintronics applications.
For future work, we are currently looking into ways to enforce constraints due to either symmetry or conservation laws more strictly and rigorously.To this end, previous computer vision literature on equivariant CNNs (see e.g., Geiger and Smidt [62]) may shed light on how to constrain CNN layers to preserve SO(3) and D 6 symmetries.Moreover, the present work is limited to approximating torques using spin directions at each time step and does not provide a direct solution to the LLG equation in Eq. 1.In a recent body of literature, however, there have been attempts to solve governing partial differential equations (PDE) of physics directly using so-called physics-aware deep neural networks (see e.g., Nguyen et al. [55]).Using these physics-aware CNN methods, we expect to attain faster and more accurate approximations of the spin dynamics, which is going to be another meaningful direction of research.
t b s S T E B R y y Y x p e W 6 M f s o 0 C i 5 h k m 8 n B m L G h 6 w P L U s V C 8 H 4 6 e z o C T 2 1 S p f 2 I m 1 L I Z 2 p v y d S F h o z D g P b G T z s h 5 c d 6 d j 3 n r i p P N H J E / c D 5 / A I W H k f M = < / l a t e x i t > ResNet blocks < l a t e x i t s h a 1 _ b a s e 6 4 = " p 5 E 0 L n 6 4 z g I w Y m C 1 y / M s O X x K W 6 s = " > A A A B 7 n i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s w U r C 6 L b l x W s A 9 o h 5 J J M 2 1 o J j M k d 4 Q y 9 C P c u F D E r d / j z r 8 x b W e h r Q c C h 3 P u J f e c I J H C o O t + O 4 W N z a 3 t n e J u a W / / 4 PC o f H z S N n G q G W + x W M a 6 G 1 D D p V C 8 h Q I l 7 y a a 0 y i Q v B N M 7 u Z + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K n V B S R K 4 G5 Y p b d R c g 6 8 T L S Q V y N A f l r / 4 w Z m n E F T J J j e l 5 b o J + R j U K J v m s 1 E 8 N T y i b 0 B H v W a p o x I 2 f L c 6 d k Q u r D E k Y a / s U k o X 6 e y O j k T H T K L C T E c W x W f X m 4 n 9 e L 8 X w x s + E S l K b i S 0 / C l N J M C b z 7 G Q o N G c o p 5 Z Q p o W 9 l b A x 1 Z S h b a h k S / B W I 6 + T d q 3 q 1 a t X D 7 V K 4 z a v o w h n c A 6 X 4 M E 1 N O A e m t A C B h N 4 h l d 4 c x L n x X l 3 P p a j B S f f O Y U / c D 5 / A I o f j 7 Y = < / l a t e x i t > flatten < l a t e x i t s h a 1 _ b a s e 6 4 = " Q C n V e Q l O 8 P H a b O 2 g p L 7 v e + Q 1 / R c = " > A A A B 8 H i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s w U f C y L b l x W s A 9 p h 5 L J Z N r Q Z G Z I 7 g i l 9 C v c u F D E r Z / j z r 8 x b W e h r Q c C h 3 P u J f e c I J X C o O t + O 4 W 1 9 Y 3 N r e J 2 a W d 3 b / + g f H j U M k m m G W + y R C a 6 E 1 D D p Y h 5 E w V K 3 k k 1 p y q Q v B 2 M b m d + + 4 l r I 5 L 4 A c c p 9 x U d x C I S j K K V H k M e S Y r I 4 3 6 5 4 l b d 4 h X P w 4 A r q c A c N a A I D B c / w C m + O d l 6 c d + d j M V p w 8 p 1 j + A P n 8 w c K p p C T < / l a t e x i t > deflatten < l a t e x i t s h a 1 _ b a s e 6 4 = " X O y D / f B m z m e C F g o T P b H s 7 G T 0 v C w = " > A A A B 8 H i c b V D J S g N B E O 1 x j X G L e v T S G A R P Y S b g c g x 6 8 R j B L J I M o a d T k z T p Z e j u E e K Q r / D i Q R G v f o 4 3 / 8 Z O M g d N f F D w e K + K q n p R w p m x v v / t r a y u r W 9 s F r a K 2 z u 7 e / u l g 8 O m U a m m 0 4 6 t w t P v c Z 9 p o J a P H S F U M 3 c r p k O i C b U u o 6 I L I V h 8 e Z k 0 q 5 X g o n J + V y 3 X r v M 4 C u g Y n a A z F K B L V E O 3 q I 4 a i C K B n t E r e v O 0 9 + K 9 e x / z 1 h U v n z l C f + B 9 / g A y 2 J C t < / l a t e x i t > normalize < l a t e x i t s h a 1 _ b a s e 6 4 = " X O y D / f B m z m e C F g o T P b H s 7 G T 0 v C w = " > A A A B 8 H i c b V D J S g N B E O 1 x j X G L e v T S G A R P Y S b g c g x 6 8 R j B L J I M o a d T k z T p Z e j u E e K Q r / D i Q R G v f o 4 3 / 8 Z O M g d N f F D w e K + K q n p R w p m x v v / t r a y u r W 9 s F r a K 2 z u 7 e / u l g 8 O m U a m m 0 4 6 t w t P v c Z 9 p o J a P H S F U M 3 c r p k O i C b U u o 6 I L I V h 8 e Z k 0 q 5 X g o n J + V y 3 X r v M 4 C u g Y n a A z F K B L V E O 3 q I 4 a i C K B n t E r e v O 0 9 + K 9 e x / z 1 h U v n z l C f + B 9 / g A y 2 J C t < / l a t e x i t > normalize < l a t e x i t s h a 1 _ b a s e 6 4 = " t d 1 Y 4 9 w R 0 X Z 7 d j c Y 4 / C j I + N Q X s g = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 n E r 2 P R i 8 e K p i 2 0 s W y 2 m 3 b p Z j f s b o Q Q + h u 8 e F D E q z / I m / / G b Z u D V h 8 M P N 6 b Y W Z e m H C m j e t + O a 8 y n b E L z F l 5 d J 6 7 z m X d U u H y 6 q 9 d s i j h I c w w m c g Q f X U I d 7 a I A P B B g 8 w y u 8 O d J 5 c d 6 d j 3 n r i l P M H M E f O J 8 / v 0 u O q Q = = < / l a t e x i t > T y i < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 e V l 0 j y r e p f V i / v z S u 0 m j 6 M I R 3 A M p + D B F d T g D u r g A w E G z / A K b 4 5 w X p x 3 5 2 P e W n D y m U P 4 A + f z B 8 D P j q o = < / l a t e x i t > T z i < l a t e x i t s h a 1 _ b a s e 6 4 = " FIG. 1. Schematic diagram of the CNN-based ML model for spin-torque prediction of itinerant electron magnets.The spin configuration {Si} on a lattice is first flattened to give three arrays, corresponding to the three components of spins, which are input to a series of ResNet blocks.Details of the ResNet are presented in Fig. 2. The output of the ResNet blocks is then processed through additional convolution layers.The final output are three arrays which after deflattening correspond to the torques {Ti} that drive the spin dynamics.
x 3 5 2 P e m n O y m U P 4 A + f z B 3 T F j L k = < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " H p m u o s j G U w d G G 2 J x e U W 1 B W H s V I w = " > A A A B 7 H i c b V B N S 8 N A F H z x s 9 a v q k c v i 0 X w V J K C 2 m P B i 8 c K p i 2 0 o W y 2 r + 3 S z S b s b o Q S + h u 8 e F D E q z / I m / / G b Z q D t g 4 s D D P v s W 8 m T A T X x n W / n Y 3 N r e 2 d 3 d J e e f / g 8 O i 4 c n L a 1 n G q G P o s F r H q h l S j 4 B J 9 w 4 3 A b 4 n 7 X r N u 6 l d P 9 S r z U Z R R w n O 4 Q K u w I N b a M I 9 t M A H B h y e 4 R X e H O m 8 O O / O x 3 J 0 w y l 2 z u A P n M 8 f G N K O 2 g = = < / l a t e x i t > input < l a t e x i t s h a 1 _ b a s e 6 4 = " g l i Q c 6 h T 7 C p 6 w A B d 9 c z j K N 2 i K i E = " > A A A B 7 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s w U 1 C 4 L b l x W s A 9 o h 5 J J M 2 1 s Z j I k d 4 Q y 9 B / c u F D E r f / j z r 8 x b W e h r Q c C h 3 P u J f e c I J H C o O t + O 4 W N z a 3 t n e J u a W / / 4 P C o f H z S N i r V j L e Y k k p 3 A 2 q 4 F D F v o U D J u 4 n m N A o k 7 w S T 2 7 n f e e L a C B U / 4 D T h e y t h I 2 p p g x t Q S V b g r c a e Z 2 0 a 1 X v u n p 1 X 6 s 0 6 n k d R T i D c 7 g E D 2 6 g A X f Q h B Y w e I R n e I U 3 R z k v z r v z s R w t O P n O K f y B 8 / k D A 7 6 P Z Q = = < / l a t e x i t > output < l a t e x i t s h a 1 _ b a s e 6 4 = " K p 6 S z B W 8 E v p E 1 b w B M / F p / v e p p t w = " > A A A B + X i c b V B N S w M x E J 2 t X 7 V + r X r 0 E m w F T 2 W 3 o P Z Y 8 O K x g v 2 A d i n Z N N u G Z p M l y R b K 0 n / i x Y M i X v 0 n 3 v w 3 p u 0 e t P X B w O O 9 G W b m h Q l n 2 n j e t 1 P Y 2 t 7 Z 3 S v u l w 4 O j 4 5 P 3 N O z t p a p I r R F 6 3 e P N b K j X o e R x E u 4 B K u w Y c 7 a M A D N K E F B K b w D K / w 5 m T O i / P u f K x a C 0 4 + c w 5 / 4 H z + A L i G k m c = < / l a t e x i t > 1 ⇥ 1 conv < l a t e x i t s h a 1 _ b a s e 6 4 = " H 1 0 1 N d Y a m / a s v A 8 y 3 M h W c h / Y v d 0 = " > A A A B 6 X i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e w G 1 B w D X r w Z g 3 l A E s L s p D c Z M j u 7 z M w K Y c k f e P G g i F f / y J t / 4 y T Z g y Y W N B R V 3 X R 3 + b H g 2 r j u t 5 P b 2 N z a 3 s n v F v b 2 D w 6 P i s c n L R 0 l i m G T R S J S H Z 9 q F F x i 0 3 A j s B M r p K E v s O 1 P b u d + + w m V 5 p F 8 N N M Y + y E d S R 5 w R o 2 V G v e N Q b H k l t 0 F y D r x M l K C D P V B 8 a s 3 j F g S o j R M U K 2 7 n h u b f k q V 4 U z g r N B L N M a U T e g I u 5 Z K G q L u p 4 t L Z + T C K k M S R M q W N G S h / p 5 I a a j 1 N P R t Z 0 j N W K 9 6 c / E / r 5 u Y o N p P u Y w T g 5 I t F w W J I C Y i 8 7 f J k C t k R k w t o U x x e y t h Y 6 o o M z a c g g 3 B W 3 1 5 n b Q q Z e + 6 f P V Q K d W q W R x 5 O I N z u A Q P b q A G d 1 C H J j A I 4 B l e 4 c 2 Z O C / O u / O x b M 0 5 2 c w p / I H z + Q N J P Y 0 t < / l a t e x i t > OR < l a t e x i t s h a 1 _ b a s e 6 4 = " a w c f j w G j L q j S 4 S 9 R G T b p a X f 1 l N w = " > A A A B 8 3 i c b V D L S g M x F M 3 4 r P V V d e k m W A R X Z a a g d l l w 4 7 K C f U A 7 l E x 6 p w 3 N J C H J C G X o b 7 h x o Y h b f 8 a d f 2 M 6 n Y W 2 H r h w c s 6 9 5 N 4 T K c 6 M 9 f 1 v b 2 N z a 3 t n t 7 R X 3 j 8 4 P D q u n J x 2 j E w 1 h T a V X O p e R A x w J q B t m e X Q U x p I E n H o R t O 7 h d 9 9 A m 2 Y F I 9 2 p i B M y 6 8 j r p 1 G v B T e 3 6 o V 5 t N o o 4 S u g c X a A r F K B b 1 E T 3 q I X a i C K F n t E r e v N S 7 8 V 7 9 z 6 W r R t e M X O G / s D 7 / A E o Y Z G + < / l a t e x i t > no operation < l a t e x i t s h a 1 _ b a s e 6 4 = " k G z e Q b R y j T x 9 K e s 7 f 7 7 FIG. 2. Diagram of a ResNet block.The input to a Resnet block goes through two different pathways: the skip connection, where no operation is performed if input and output have the same number of channels or a 1 × 1 convolution layer otherwise; and normal connection, where two 5 × 5 convolution -ReLU activation blocks are stacked on top of each other.The results of the two pathways are then added together as the output.

FIG. 3 .
FIG. 3. Predicted spin torque components (Tx, Ty, Tz) versus ground truth components from the testing set.The red-dotted diagonal lines indicate perfect prediction.The top row shows the prediction results based on spin configurations obtained from LLG simulations of a perturbed SkL.Results from LLG simulations with random initial states are shown in the bottom row.

FIG. 4 .
FIG. 4. (a) Comparison of predicted field vector magnitude against ground truth.The red line indicates prediction equalling to ground truth, the outer red dotted line represents a 10 −2 deviation from ground truth magnitude while the inner one represents a deviation of 10 −3 .The color denotes the log density.(b) Angular difference between ground truth field vector and predicted field vector.

4 FIG. 5 .
FIG. 5. A comparison of spin structural factors obtained by averaging 30 independent LLG simulations based on KPM (left) and ML model (right).The same set of random initial conditions on a 48 × 48 triangular lattice were used in both simulations.The red dashed lines indicate the first Brillouin zone of the momentum space.

FIG. 6 .
FIG.6.Snapshot of the spin configuration at the end of the LLG simulation with random initial conditions on a 48×48 triangular lattice.

FIG. 7 .
FIG.7.CNN-based LLG simulation on a 96 × 96 lattice showing the restoration of a perturbed SkL.The CNN model was originally trained on a 48 × 48 lattice.The initial spin configuration is given by the SkL ansatz(13) with additional sitedependent random phases and amplitudes of Sz.

FIG. 8 .
FIG. 8.The evolution of structural factor of a much longer than training simulation with time, using perturbed Skyrmion initial condition.The dashed black line indicates the duration of the training simulation, and our ML-based LLG simulation is capable of keeping the structural factor constant for more than 5 times the duration of the training simulation.
sen, and a random local symmetry g t,b ∈ G is applied to each spin S b and torque T b field, b ∈ B t .Then, according to the stochastic gradient descent (SGD) algorithm, the parameters θ of the CNN model f θ get updated as the training procedure, at each step t, a minibatch B t of spin-torque (S, T) samples of size |B t | is cho-Distribution of the equivariance error erreq := fCNN({RSi}) − RfCNN({Si}) where R is an arbitrary rotation.The overall prediction error ('predicted torque' minus 'ground truth torque') on the test dataset is superimposed for comparison.The equivariance error is sufficiently smaller than the overall prediction error, implying that the model can preserve the underlying symmetry of the physics system reasonably well.