Protocol Discovery for the Quantum Control of Majoranas by Differentiable Programming and Natural Evolution Strategies

Quantum control, which refers to the active manipulation of physical systems described by the laws of quantum mechanics, constitutes an essential ingredient for the development of quantum technology. Here we apply Differentiable Programming (DP) and Natural Evolution Strategies (NES) to the optimal transport of Majorana zero modes in superconducting nanowires, a key element to the success of Majorana-based topological quantum computation. We formulate the motion control of Majorana zero modes as an optimization problem for which we propose a new categorization of four different regimes with respect to the critical velocity of the system and the total transport time. In addition to correctly recovering the anticipated smooth protocols in the adiabatic regime, our algorithms uncover efficient but strikingly counter-intuitive motion strategies in the non-adiabatic regime. The emergent picture reveals a simple but high fidelity strategy that makes use of pulse-like jumps at the beginning and the end of the protocol with a period of constant velocity in between the jumps, which we dub the jump-move-jump protocol. We provide a transparent semi-analytical picture, which uses the sudden approximation and a reformulation of the Majorana motion in a moving frame, to illuminate the key characteristics of the jump-move-jump control strategy. We verify that the jump-move-jump protocol remains robust against the presence of interactions or disorder, and corroborate its high efficacy on a realistic proximity coupled nanowire model. Our results demonstrate that machine learning for quantum control can be applied efficiently to quantum many-body dynamical systems with performance levels that make it relevant to the realization of large-scale quantum technology.


I. INTRODUCTION
Through the use of a wide array of promising experimental platforms ranging from superconducting qubits [1] to trapped ions [2][3][4], optical lattices [5] and nitrogenvacancy centers [6,7], scientists are exploring groundbreaking ways to build quantum technology with an eye on deepening our understanding of complex natural systems, improving artificial intelligence, and impacting industry more broadly. While promising, several fundamental and practical difficulties must be overcome for quantum machines to become practical [8]. Quantum control, which studies the manipulation of physical systems whose behaviour is dominated by the laws of quantum mechanics, remains a fundamental ingredient in the quest for practical quantum technology. Thus, the development of principles and strategies for quantum control [9][10][11][12][13] remains an essential task for future quantum technologies [14], e.g., in the preparation of complex quantum states in quantum computing and quantum simulators.
The key to manipulating this topologically encoded quantum information is the development of protocols to transport the Majoranas, as will be necessary for their braiding and measurement. While the simplest approaches proposed to move the Majoranas adiabatically, decoherence processes such as quasiparticle poisoning [31], motivate the need for faster and more efficient control protocols.
In this context, machine learning (ML) offers a powerful and unifying approach to the study, design, benchmarking, and control of quantum systems and devices. Motivated by a series of remarkable technological breakthroughs in research areas as diverse as computer vision [32], natural language processing [33] and genomics [34], physicists have started to explore the potential of ML for fundamental research [35]. In particular, researchers interested in quantum many-body physics have initiated the development of a machine learning perspective on the many-body problem [36] with recent advances such as neural network representation of quantum states [37][38][39][40][41][42][43][44][45][46][47][48], neural network tomography [49][50][51], machine learning classification of phases [38], as well as advances in quantum chemistry [52][53][54] theory [55,56] and the acceleration of Monte Carlo simulations [57][58][59], preparation of quantum states [60,61], quantum feedback [62], among many others [36,63]. ML applications in the context of quantum control [64][65][66][67] are largely based on reinforcement learning (RL) techniques, which are the key ingredient behind major artificial intelligence breakthroughs in game play [68,69]. A RL perspective on quantum control has been developed in the context of quantum state preparation [64], where the authors reexamine the quantum state preparation in terms of a modified version of the Watkins Q-learning algorithm [70]. Likewise, Ref. [65] considers the problem of gate synthesis from a RL viewpoint.
In this work we exploit Differentiable Programming (DP) [71] and Natural Evolutionary Strategies (NES) [72] to find optimal strategies to transport Majorana zero modes. We reformulate the complex task of transporting a Majorana zero mode as a quantum control optimization problem amenable to advanced ML techniques. This can be most easily understood from a reinforcement learning perspective where the movement of the Majoranas can be recast as a "game", see Fig. 1. In this game the player (agent) has to move a Majorana from a position x A to a position x B in a fixed amount of time T . At the end of the game the agent gets rewarded depending on how well the Majorana reached its target state at position x B . The objective of the agent is to learn the best strategy (path of the Majorana) that maximizes the reward.
Our machine learning techniques discover a novel and counter-intuitive approach to transporting Majorana zero modes, here dubbed the jump-move-jump approach and exemplified in Fig. 1, which yields a high fidelity control significantly superior in terms of speed and quality to previous strategies applied to this problem [73][74][75]. The jump-move-jump protocol, which is relevant for transporting Majoranas over a large distance l in a short time T , makes use of pulse-like jumps at the beginning and end of the protocol (see regions I-III of Fig. 1), with a period of nearly constant velocity between the jumps. This is particularly interesting given the previous results for other models showed that bang-bang protocols [74] were the most efficient approach to this form of quantum control.
Using the insight gleaned from our ML-inspired control protocols, we construct the core strategy for these paths and provide a theoretical understanding for the protocol by analyzing the motion of the Majoranas in a moving frame. We find that these protocols simultaneously use the stability of the system at finite velocities together with the fact that small sudden jumps in wall position do not significantly reduce the ground-state fidelity. In contrast, in the limit where there is a significant total time T to move a relatively small distance l, the best approach is to follow smooth protocols that follow an adiabatic path. Our ML technology recovers these protocols without any prior knowledge (see region IV of Fig. 1). In addition, we verify that the jump-move-jump protocol retains its usefulness in the presence of interactions or disorder, and also corroborate its high efficacy using the more experimentally relevant proximity-coupled semiconductor nanowire model.
We have structured this paper as follows: in section II we introduce the setup of the Majorana control problem. In section III we introduce the optimization methods DP and NES for which we provide the optimization results in section IV which we benchmark against a standard simulated annealing method. In section IV we also highlight some of the advantages of the DP and NES methods compared to others. We describe and analyse the physical mechanisms behind the jump-move-jump strategy in section V. Then we provide some numerical results of the robustness of the jump-move-jump strategy with respect to interactions or disorder in section VI before concluding our work.
We also include several appendices where we provide extra details regarding the moving frame in App. A, ML methods in App. B, C and E, extra results for the proximity-coupled semiconductor nanowire in App. F and the robustness of the JMJ strategy with respect to interactions and disorder in App. H. Finally in App. G and App. I we discuss extra details for the analysis of the JMJ strategy.

II. SETUP OF THE MAJORANA CONTROL PROBLEM
To model the movement of the Majorana zero modes in superconducting nano-wires we use a one-dimensional (1D) Kitaev Chain [25] described by the Hamiltonian where c ( †) x are fermionic annihilation (creation) operators, µ(x) the onsite chemical potential, V (x, t) a timedependent external potential, w the hopping amplitude and ∆ the p-wave superconducting gap. This model can be realized effectively in variety of setups by proximity coupling to a conventional s-wave superconductor [15-17, 19, 20, 76-78]. When |µ| ≥ |2w| the gap in the energy spectrum closes inducing a transition to a topological trivial phase [25,79]. For an open chain, a pair of Majorana zero-modes are found to reside on the edges of the wire if the bulk is topologically non-trivial. The existence of such Majorana modes implies the existence of a degenerate space of ground states. The ground state of the system is protected by a robust topological energy gap, which in the continuum limit of the model is given by min[µ c , ∆ c k F ]. As a consequence of this protection, two pairs of Majorana zero-modes can be used to produce a qubit whose information content is protected non-locally [27].
Manipulation of the information encoded in the ground state requires the braiding of the Majorana quasiparticles [18,26,28,29,[80][81][82][83], while staying as much as possible within the degenerate ground-state space. To achieve this, a generic strategy is to try to perform the braiding adiabatically. An adiabatic path generically must have a total time T larger then the inverse gap; in practice the size of the topological gap makes such a strategy prohibitively slow in view of numerous sources of decoherence in proximity coupled setups [31,[84][85][86][87][88][89][90][91].
In light of this, our strategy in what follows (see also [73,75,92]), is to attempt to move Majoranas as quickly and efficiently as possible. To this end, and to have smooth control over the position of the Majorana bound states on the lattice, we encode the external potential profile as where V height is the maximum height of the outer potential walls and f (x) = 1/(1 + exp(x/σ)) is a sigmoid function shifted by the wall positions x L(R) (Fig. 1 a). The Majoranas are localized at the outer edges of the potential profile, which can be seen as domain walls between topological and non-topological phases since V height 2w. The position of these domain walls can be be controlled through x L(R) ; to move the left Majorana we give the agent control over x L (t) as a function of time.
The motivation for this specific form of potential profile comes from experimental proposals for moving the Majorana by the so called piano-key potentials [18,73,93]. In this proposal the position of the domain wall in the wire is controlled by applying gate electrodes. 1 At variance with protocols found in [74], the presence of nonlinearities in the potential profile implies that bang-bang protocols, which are expected for linear control [96,97], may no longer be optimal. As a consequence of this, it is conceivable that the space of possible solutions is expanded in our setting.
The Majorana game starts at t = 0 with the system in the groundstate of Eq. 1 with x L (0) = x A . The aim is to reach the groundstate |Ψ B with x L (T ) = x B located at a distance x B − x A = l spending a total time T . A natural choice to quantify the accuracy of the protocol is the infidelity Here, H(t) is the time-dependent Hamiltonian that describes the Majorana wire setup Eq. 1 during the protocol. Whereas I(T ) = 0 means that we have fully preserved the encoded quantum information, I(T ) = 1 implies that information has been completely lost.
A key timescale related to the control problem is T res = 2π ∆k F , which naturally arises from the response of the system to boundary wall oscillations (see [75] and App. A).
T res corresponds to the time above which one can make changes slowly enough for genuinely super-adiabatic motion [73,98,99]. Super-adiabaticity in this scenario allows the static ground state to be accelerated into the ground state of a moving frame, provided the transition is done slowly enough, i.e., in times large compared to T res . Another important additional physical constraint is the presence of a critical velocity v crit = ∆ [73,92] above which the moving frame Hamiltonian becomes gapless (App. A) and the ground states lose their topological protection.
Based on this we divide up the Majorana control problem into four different regimes (I-IV) (see Fig. 1

c).
• Regime I corresponds to the critical regime in which the Majorana must move on average v avg = l/T above the critical velocity. In this regime the ground state fidelity is expected to rapidly decrease to zero.
• Regime II is the sub-critical regime for which the velocity is on average close to but nonetheless below v crit . This regime is open ended in both time T and length l. The key feature distinguishing this regime from regime IV below is character of the found optimal protocols.
• Regime III is again a sub-critical regime, defined for the times that are short with respect to the resonance time T res but also has low velocity. This region cannot be used to efficiently move Majorana states over long distances, but we expect it to be relevant for braiding protocols based on small relative movements that change the effective couplings between Majoranas.
• Finally regime IV is the adiabatic regime in which we are above T res and we have sufficient time to expect that slow ramp-up/ramp-down protocols [73,75] from earlier studies to be optimal. Ideally one would always like to be in this regime, however a gradual build up of noise and decoherence may make it necessary to get things done more quickly.
A. Relevance to other braiding schemes It is worth mentioning here that the aforementioned trade-off between adiabaticity and the need to perform operations quickly has led to the emergence of other approaches to braiding. These schemes seek to circumvent the need to manipulate/shuttle Majorana's over excessively long distances. This includes local [18,100,101] and non-local [102,103] Majorana coupling in wirenetworks, and the implementation of one-way computation schemes [104,105] via projective charge measurements [30,[106][107][108][109][110].
All of these methods still require some level of adiabatic control, or some effective short-cutting thereof (see e.g. [111,112]) and there can also be some additional downsides. The interaction/coupling schemes for example require precise control of the couplings between neighbouring Majoranas and the loss of some topological protection [87,113]. In implementations of wire measurement-only schemes one must also tune a coupling between wire and a quantum dot [30,108] and accept state manipulation that is inherently probabilistic [110].
The phase diagram that we have outlined above is directly applicable to the local coupling schemes, where the relative Majorana position can be seen as a proxy for the coupling strength. A similar proxy may also hold for measurement-only schemes via the distance between topological boundary and quantum dot, although the connection here is harder to make directly because one should also account for fermion number conservation [30,[106][107][108][109]. That said, the DP and NES methods we apply below do have direct relevance for even more sophisticated numerical treatments of this problem. We refer to our concluding remarks for further discussion of this point.

III. MACHINE LEARNING METHODS
In our study we apply two machine learning strategies, namely Differentiable Programming (DP) and Natural Evolution Strategies (NES), to minimize the infidelity Eq. 3 with respect to the position of the domain wall x L (t). 2 In this section we introduce these methods in a way that directly applies to our Majorana control setup. For the interested reader we provide the corresponding programming codes in [114].

A. Differentiable Programming
DP is a programming paradigm to evaluate gradients of computer programs [115] that has been recently intro-duced as an optimization method in physics applications like Tensor Networks [116,117], Monte Carlo [40,118,119], and optimal control [67,120]. DP obtains numerically exact gradients with similar computational time complexity as the forward evaluation of the computer program due to the use of backward propagation [121][122][123].
In our Majorana control optimization problem we are interested in the total derivative of the infidelity with respect to the control dI(T ) dx L (t) . As shown in App. B, the algorithm to compute I(T ) consists of a sequence of elementary operations f i , in DP language known as primitives, which maps the input control x L (t) to the value of the infidelity by I(T ) = f n • f n−1 • · · · • f 1 (x L (t)). Moreover, the derivatives of each individual operation f i are known and the total derivative dI(T ) dx L (t) can be assembled by recursively applying the chain rule with automatic differentiation (see App. B).
In practice, we write a code to evaluate the realtime dynamics of the Majorana induced by the timedependent profile V (x, t) such that all the individual operations are differentiable. To obtain the necessary gradients we use a language that supports automatic differentiation, which in our case is the JAX library [124]. The gradients are used to minimize the infidelity of the final state using Gradient Descent (GD) and ADAM [125]. For GD this is done iteratively, where at each iteration of the algorithm the protocol is updated as . Moreover since neural networks consist of differentiable elementary operations, we also explore representing and training the control x L (t) as the output of a neural network for which the weights are updated with the standard update scheme ADAM.

B. Natural Evolution Strategies
Evolution Strategies is a family of black-box optimization algorithms inspired by natural evolution [72,126]. These methodologies have been recently revisited in the context of machine learning [127], in particular in reinforcement learning [128], where OpenAI demonstrated that a particular incarnation of the algorithm called natural evolution strategies (NES) offers a powerful alternative to popular Markov-decision process-based reinforcement learning methods [129].
NES starts with an objective function F (φ) that acts on parameters φ from a population described by a distribution p θ , where θ parameterizes the population distribution. In our work, we choose the objective function F (φ) to be the infidelity I(T, φ) in Eq. 3. Depending on the setting, the parameter φ corresponds to either the position profile x L (t), the velocity v L (t) =ẋ L (t) or the parameters of a neural network whose output is The NES algorithm proceeds to optimize the expectation value of the objective function E φ∼p θ [I(T, φ)] over the population. We choose p θ to be a Gaussian distribution with mean θ and diagonal covariance matrix Σ = σ 2 I with σ = 0.1, i.e., N (θ, σ 2 I). It follows that the gradient of the cost function is (see App. C) This equation provides an efficient way for computing gradients without differentiation, but instead through the expectation of the objective function. Notice that E ∼N (0,I) [I(T, θ) /σ] = 0, which implies the above equation is equivalent to E ∼N (0,I) [(I(T, θ +σ )−I(T, θ)) /σ]. This means that the estimation of the gradient can be seen as the finite difference of the objective function with random search [128]. To update the parameters θ, we apply gradient descent to θ with Eq. 4.

IV. MACHINE LEARNED STRATEGIES FOR MAJORANA CONTROL
Our main objective is to investigate the use of DP and NES for the motion of the Majorana modes in the four different movement regimes I-IV. We choose four different points (l, T ) (which fixes v avg = l/T ) each of which belongs to a different regime. For each of these points we apply the optimization algorithms to minimize the infidelity I(T ) in Eq. 3 with respect to the control of the domain wall. The control can be parameterized by either the wall position x L (t), the wall velocity v L (t) or a neural network x L (t) = NN θ (t), where θ represent the parameters in the neural network. We tested all these different parameterizations (see App. D) and in the following we focus on the parameterization choices that give the best performance (the lowest infidelity values).
For the DP optimization, we first parameterize the wall position by a neural network x L (t) = NN θ (t) and use ADAM [125] to optimize the parameters. We then take the resulting real-space profile x L (t) and directly reoptimize it in the position representation with GD. We find that this two-step process is beneficial; while the initial NN stage allows a quick convergence to a smooth nearly-optimal control profile, the second step allows for additional fine-tuning and more abrupt changes in the protocol. The neural network NN θ (t) used for these simulations consists of three layers of 100 neurons with Rectified Linear (ReLU) activation functions followed by one output neuron with a sigmoid activation function. The learning rate for the ADAM optimization algorithm is determined empirically by running a range of values and picking the one that showed the best convergence, a method which is similar to the one used in [125]. The NES optimization in regimes I/II operates directly on the wall position x L (t) and in regimes III/IV it operates on the wall velocity v L (t) from which the wall position can be extracted via integration The uncertainty parameter for NES can be viewed as a trade-off of exploration and exploitation [70] and was determined empirically by testing a range of values. We note that in [128] it was shown that NES is a robust method with respect to different hyper-parameters in different learning settings. The time-dependent simulations of the fermionic system are discretized over time with a small time resolution dt = 0.01. We allow the ML algorithms control over only a coarse grained time scale ∆t ≥ 10dt such that in the continuous time limit dt → 0 we get a continuous protocol x L (t) and the domain wall position is not discontinuously changing every single discrete time step dt. Moreover to probe the susceptibility of the optimized protocols to initial conditions, we repeat the optimizations a few times for several different starting configurations to ensure that the results are independent of the initialization.
The results for these optimizations 3 in the four different regimes are shown in Fig. 2(a-h) and can be summa-rized as follows. In the regimes I-II-III we can identify clear similarities between all of the optimised strategies which display sudden movements (jumps) at the beginning and end, and more gradual rates of changes in the middle of the protocol. The initial sudden movements can be roughly characterised by a rapid jump forward, followed by a less abrupt backward motion. The jumps near the end of the protocol display an analogous movements in the reverse order, although these are generally less pronounced. The middle of the protocols consists of an approximately constant-velocity forward-motion that is dressed to various degrees with high frequency oscillations. We observe that the velocity in the middle part of the protocols is typically lower than v crit even for regime I, where the the total average velocity v avg (including the sudden movements) is above v crit . The size and character of the additional oscillations largely depends on the optimization strategy being used.
The infidelity values we find in regimes I-II-III are significantly better compared to strategies like linear motion x L (t) = v avg t or bang-bang as shown in App. D. In the critical regime I we get an infidelity of about I(T ) ≈ 0.35 compared to I(T ) = 0.47 for a linear protocol whereas in regime II we get I(T ) ≈ 0.15 versus I(T ) = 0.22 for linear. In the low average velocity regime III we get infidelities as low as I(T ) ≈ 0.006 while linear motion gives I(T ) = 0.012. The infidelity value improvement in regime I is rather surprising given that the Majorana moves on average above the critical velocity. Below, we explain why the jumps in this scenario are particularly beneficial.
In regime IV the results show a globally different strategy: in this regime the optimal protocol is to slowly accelerate the Majorana up to some finite velocity and then slowly decelerate back down to the target position. This type of protocol was discussed in earlier work [73,75] in regimes where there is enough time to accelerate to a moving frame, i.e. regime IV. We note that due to the nearly adiabatic motion in regime IV, the values of infidelity are the lowest I(T ) ≈ O(10 −4 ). These infidelity values make regime IV optimal for braiding of Majoranas in an ideal experimental setup. However, due to imperfection in the devices and their coupling to a noisy environment, balancing the infidelity associated with shorter protocols with respect to the infidelity induced by dephasing in a longer time protocol makes controlling Majorana movement in faster regimes relevant for near term experiments.
Compared to previous studies however [73,75,89] we find that the starting velocity v L (t = 0) of the protocols in regime IV can be finite if one allows for small but finite infidelity values I(T ) ≈ O(10 −4 ). In the obtained protocols for example we have v L (t = 0) = 0 followed by an approximately smooth gradual build up and down of the velocity. Moreover, the optimization techniques with which these protocols were obtained have as additional advantage that they can fine tune the protocols up to a finer level (bigger search space) than the previously studied parameterized adiabatic protocols [75], as we show in App. D). This advantage of our methods might be particularly beneficial for finding optimal protocols in the presence of disorder in the wires as discussed briefly in section VI B and App. H.
Note that although we have focused on the Kitaev chain model in Eq.1 the results and conclusions discussed here remain true on the more realistic proximity coupled nanowire models, where the JMJ protocol is optimal in the non-adiabatic regime (see details of the proximity couple nanowire simulations in App. F). Additionally, the results on the proximity couple nanowire in App. F are compatible with a smooth protocol in the adiabatic regime, as well as consistent with the results on the Kitaev chain in the large Zeeman field and large strong spin-orbit coupling limits.

Comparison between simulated annealing and DP and NES optimization
We now benchmark the results of the DP and NES algorithms against the standard simulated annealing (SA) method following Ref. [74] closely. In the SA method the wall velocity v L (t) of the domain wall is iteratively updated by choosing two random intervals of length ∆t of which one interval is increased by ∆v and the other is decreased by ∆v. The new infidelity I i is calculated for the updated velocity profile and the move is accepted with a probability e −δI/T SA where δI = I i − I i−1 is the difference in the infidelity with respect to previous iteration. The annealing temperature T SA is slowly cooled down to zero. The results of this benchmark in each of the four regimes are shown in Fig. 2(i-l) and are qualitatively similar to the results obtained with NES and DP ( Fig. 2(a-h)).
In practice, we find that SA is significantly more computationally demanding than NES or DP since a typical SA simulation entails the evaluation of the many-body infidelity for each SA update step. To obtain results with comparable infidelities, the SA simulation requires a total of 10 5 evaluations of the infidelity while for DP we only need 440 update steps each involving a single infidelity and gradient evaluation (the latter requiring similar computational complexity as the infidelity calculation); the NES algorithm requires 50 update steps each with 100 parallel evaluations to reach convergence.
This differences may be partially attributed to the fact that SA does not take advantage of any gradient signal. Beyond these practical observations, a careful scaling analysis of the convergence of these methods requires an analysis of the convexity properties of the infidelity landscape as a function of the control parameters. This can be either done analytically for simple infidelity landscapes [130], but may require a numerical investigation for control problems exhibiting a glassy landscape [131].
Overall, we highlight that DP offers a powerful tool for quantum control as long as an accurate and differentiable physical model is available. It is found that direct gradient based gradient methods are usually more stable and efficient than RL [132]. In contrast, NES can optimize non-differentiable and discrete control protocols, both of which remain challenging for DP. NES can be also directly applied to experimental settings as long as a suitable objective function, such as the expectation value of a Hermitian observable or the fidelity, is available. For instance, recent experiments demonstrate the teleportation of Majorana modes in a quantum simulation of a small Majorana chain [133] including access to the fidelity of the protocol. This possibility makes NES a viable tool for controlling and designing quantum simulations of Majorana modes.
Additional discussions detailing connections, comparisons, advantages, and disadvantages of the machine learning approaches in relation to other advanced methods used in quantum control are presented in App E.  Fig. 2. The optimal dressed jumps appear at a diagonal set of parameters ∆x forward − ∆x back ∼ C where C ≈ 0.96. We note that the best protocol has a jump size bigger than the movement length ∆x forward > l which means that it jumps over the target position xL(T ) = xB and then jumps back within the range xA < xL < xB as can be seen in Fig. 1 d) (for regimes I and III).

V. JUMP-MOVE-JUMP [JMJ] MAJORANA CONTROL STRATEGY
The main features of the ML strategies in regimes I, II and III can be encapsulated in a simple model strategy, the jump-move-jump strategy as shown in Fig. 3 a), amenable to semi-analytical and numerical analysis. This strategy consists of two dressed jumps in the position of the domain wall at the beginning and end of the protocol interluded by a period of motion at a constant velocity v. The initial dressed jump comprises an instantaneous forward jump over a distance ∆x forward followed by a rapid backward movement over a distance ∆x back in a time ∆t back . Similarly, the last pulse starts with a rapid backward movement ∆x back followed by a forward jump over a distance ∆x forward . In what follows, we assume that jumps at the beginning and end of the protocol are symmetric and are described by the same parameters.
The simplicity of this model allows us to develop an understanding for the key mechanisms behind the ML strategies and estimate the value of the infidelity of the JMJ strategy for a wide range of parameters l, T in regimes I-III (see Fig. 5 and Fig. 3 (b)). In section V A below we will first argue that when we disregard the backward movements, i.e. ∆x back = 0, the overall strategy can be explained via a trade-off (Fig. 4) between the amount of fidelity loss due to the boundary jumps and the loss due to sudden changes in velocity.
In section V B we also explore how this mechanism can be better controlled with the backward movements ∆x back = 0 which, for a certain subset of parameters (∆x forward , ∆x back ), leads to a model strategy with infidelities that compare competitively with the best ML learned protocols (see the discussion in App. G and Figs. 1 (d) and 2). Our analysis shows that this dressed protocol allows one to better target the ground-state of the system in a moving (constant velocity) frame.
A. Bare Jump-Move-Jump (∆x back = 0) For the case ∆x back = 0 we minimize the infidelity Eq. 3 with respect to the velocity v of the bulk of the protocol which fixes the instantaneous forward jump to ∆x forward = (l − vT )/2 ≡ δ. To make analytic analysis simpler we focus on the case that both the left and the right domain wall positions are being evolved simultaneously with the JMJ strategy. This means that the right wall position x R in Eq. 2 becomes time-dependent via x R (t) = x L (t) + C x with C x a fixed constant. This two wall scenario makes it possible to evaluate the strategy in the moving frame basis (App. A), which allows the key rationale behind the JMJ strategy to be revealed.
To evaluate this strategy we expand the infidelity after the first jump A+δ of the Hamiltonian Eq. 1 with the wall at position x L = x A + δ giving is the time-ordered evolution operator with H(t) following the time-dependence of the strategy. We break this equation up into two separate terms which allows for an approximate analysis of the separate contributions. Here we focus on the groundstate contribution (first line), making the assumption that the second line is small in comparison.
To approximate the groundstate contribution we insert projections onto the groundstate ψ 0 v of a moving frame Hamiltonian H(v) with a velocity v equal to the bulk velocity of the strategy and finally a projection onto the groundstate ψ 0 B−δ of the Hamiltonian Eq. 1 with the wall at position x L = x B − δ (just before the final jump) resulting in represent the initial and final jumps in position space of size δ which by fitting to our numerical model can be characterized as where s ∼ +αλ F with λ F the Fermi wavelength and the fitting parameters ( , α) = (−0.33, 0.44) for one-wall and (−0.12, 0.3) for two-walls when ∆ = 0.3. In the appendix I we discuss this in more details, showing also how the fit s scales with the coherence length ξ of the Majorana mode.
The amplitude ψ 0 v ψ 0 A+δ represents the overlap between the static ground state and that of a moving frame with a constant velocity v. We estimate this as follows: with γ = 1/ 1 − v 2 /∆ 2 . One way to argue for this form is to use the results of [75] that showed that the energy of the moving frame ground state w.r.t. to the static frame goes as E ∼ γM v 2 where M ∝ k F /∆. The overlap can be related to this energy via O v ∼ 1 − E/k F ∆. We arrive at a value of β ∼ 0.13 by fitting Eq. 9 to moving frame (two-wall) numerical simulations for v v crit . For single wall motion we can calculate value of β ∼ 0.065 by slowly accelerating the wall up to v v crit and comparing the evolved state with the instantaneous ground-state. Note that this analysis also holds for the more general proximity coupled semiconducting nanowire with different values for fitting parameters α and β.  Fig. 1 (c). The jump-move-jump protocol allows for low infidelities even in cases where the average velocity exceeds vcrit (regime I).
If we assume a symmetric strategy (e.g. jumps of the same size at the beginning and end) then the fidelity function F(T ) ≈ |O δ O ν | 2 (where I(T ) = 1 − F(T )) to be maximized is approximated as An important part of the strategy are the jumps at the beginning and end of the protocol that allow for a constant middle motion at lower velocities. In Fig. 5 we show how the infidelity behaves for a range of movement lengths l and times T obtained from maximizing Eq. 10 with respect to v. By comparing with Fig. 1 (c)  This method is only effective however if the jumps needed are not too large (δ ∼ O(λ F )). Interestingly, as the overlap behaviour of O v is relatively unaffected by changes to the chemical potential, this suggests that a chemical potential nearer to the bottom of the band, µ → −2w can help to extend the range of the protocol. On top of the bare jump-move-jump structure, the ML algorithms dress the protocols with additional forward and backward movements. In this section we will show that the primary purpose behind these additional motions is to better target the moving frame ground state in the move part of the jump-move-jump protocol.
To proceed we focus on the simple upgraded JMJ protocol that also allows backward motions over distances ∆x back = 0, see Fig. 3 (a). Our first key observation is that the optimal protocols here tend to choose a combined jump size ∆x forward − ∆x back ∼ C, see Fig. 3 (b), where C is a fixed constant. This indicates the same primary goal as the JMJ strategy: to allow a period of optimal and roughly constant sub-critical motion.
To understand why the backward motion is a further benefit we examine the instantaneous infidelity in both the static and moving frames for a series of protocols with the fixed bulk velocity v, see Fig. 6 (a). The instantaneous infidelity is defined as follows the time-dependence of the dressed JMJ strategy and ψ 0 ins (t) corresponds to the instantaneous groundstate (at the fixed time t) of the static frame Hamiltonian H(t) (Eq. 1) or the moving frame Hamiltonian H v (t) (Eq. A2). The static frame perspective shows that the optimal jump finds a state that slowly becomes the lowest infidelity state after some time. However a more revealing picture emerges if we examine the same set of protocols in a moving frame (inset). Here we see that the objective of the initial jumps is to maximize the overlap with the ground state of the moving frame.
Using the same methodology we can also reveal what moving frame bulk excitations ψ i v with many-body energy E i v (restricting here to single-and two particle excitations only) the dressed protocol occupies during the evolution, see Fig. 6 (b). For protocols close to the bare JMJ strategy (shown in red) one finds that the domi-a) b) FIG. 6. a) Instantaneous Infidelity I(t) with respect to the static (lab) frame groundstate (main panel) and moving frame groundstate (inset) as a function of the protocol time t for various dressed jumps in regime I which are depicted by the different colours (red,blue,green,grey) and shown in the inset of b). The optimal dressed jump (blue) has the minimum infidelity with the moving frame groundstate and becomes the strategy with the minimum static frame infidelity after about t ≈ 4. b) The time-evolved state |ψ(t) expanded in the single and two particle excitations ψ i v of the moving frame basis with energies E i v at t = T /2 for a forward jump (red), an optimal dressed jump (blue) and non-optimal dressed jump (green) for regime I as given in the inset. The occupation probabilities (delta functions) are convoluted for visualisation purposes. The forward jump strategy (red) occupies the excitations close to the bottom of the band in the moving frame (E i v ≈ 0.08) more compared to the dressed jumps (blue and green). In this figure all times are in units of 1/ω. nant excitations have energies close to the moving frame bulk gap (E i v ≈ 0.08). The dressed jumps (in blue and green) however are able to lower the amplitudes of these excitations, but at the cost of also exciting some higher energy quasi-particles. For dressed jumps that are too large (in green) these higher energy excitations eventually dominate and one gets an increase in the ground state infidelity.
Before ending this section, we now briefly summarize why similar JMJ strategies work in the regimes I-III. The first advantage of the JMJ strategy is that, with abrupt initial and final costs, it allows for an effective reduction in the distance l. This is especially obvious in the case of regime I where, by definition, simple linear motion implies a need to move above the critical velocity.
The other crucial aspect of the dressed protocols is that they can better target the ground state of the moving system through the combination of jumping and moving. This allows the system to be rapidly accelerated to a near constant velocity frame where there is no additional infidelity cost. For a fixed distance l, the JMJ strategy therefore generally allows for a reduced time by cutting out the slow acceleration parts of regime IV, allowing regime III, regime II and, to a lesser extent, regime I exploit the property of super-adiabaticity.
Of course it is always possible to define scenarios deep in regime I where the distance and time constraints would lead to large infidelity even for the dressed JMJ protocols. However they do allow, under the same spatial and temporal constraints, for Majoranas to be moved more efficiently than that of linear motion or slow ramp-up ramp-down protocols.

VI. ROBUSTNESS OF THE JMJ PROTOCOL WITH RESPECT TO INTERACTIONS AND DISORDER
In the space between our simplified theoretical setups (the Kitaev chain in Eq. 1 and Proximity Coupled Semiconductor in App. F) and actual topological quantum devices there are many additional layers of complexity, and one might wonder which aspects of the JMJ protocol remain robust. In this section we assess the robustness of our optimal JMJ protocols obtained in the clean non-interacting system when applied in a system with interactions (see e.g [134][135][136][137][138][139][140]), and separately applied in a system with disorder, see [76,141,142]. We find that the fidelity loss due to the jumps of the JMJ protocol remains robust and that the overall protocol still outperforms the naive linear benchmark protocol significantly. Finally we comment on the extension of our ML methods to include these more complex scenarios. H int . Since H int is quartic in the creation and annihilation operators, the BdG time evolution algorithm used before (see App. B) cannot be applied directly and we consider a time-dependent variational principle with matrix product states (MPS-TDVP) [143,144] approach to simulate the dynamics.
In Fig. 7 we show how the optimal protocol obtained in the non-interacting system in regime I performs for various finite interaction strengths (see App. H for the other regimes). From the instantaneous infidelity plot in the main panel we can see that the fidelity loss due to the jumps remains robust with increasing interaction strength whereas in the bulk (move part) the loss increases. As a consequence the final infidelity I(T ) increases with interaction strength (See inset Fig. 7), but the performance remains superior to a naive linear protocol at all interaction strengths. The increase of infidelity with interaction can be argued for on the basis of meanfield theory, where one expects repulsive interactions to lower the effective gap (leading to a lower critical velocity) and the appearance of non-uniformity in the effective coupling parameters near the system boundary.
Since we did not perform the optimization directly in the presence of interactions, we cannot argue that the optimal JMJ protocol from the non-interacting system is also the optimal protocol in the interacting system. We note however that our ML optimization methods can be directly extended for this task. Firstly, the NES algorithm is a black-box optimization method and can be combined with any method that generates the dynamics such as the MPS-TDVP algorithm. Secondly, to apply DP one needs to make the code for the MPS-TDVP algorithm differentiable. In this regard we note that the DMRG algorithm to find ground states in interacting systems has been recently made differentiable [116].

B. The effect of disorder
Disorder in the wires can be modeled by adding a random Gaussian noise termμ(x) with mean 0 and standard deviation λ to the chemical potential µ dis (x) = µ(x) +μ(x) [76,141,142]. It has been shown that this form of disorder leads to an increase in the effective coherence length ξ eff of the Majorana, which in the continuum limit leads to 1 ξ eff = 1 ξ − 1 2l dis . Here l dis = v 2 F /λ 2 is the characteristic disorder length scale. When the disorder strength is such that 2l dis = ξ, the effective Majorana coherence length ξ eff diverges and the topological phase is destroyed. Another consequence of disorder is that the critical velocity is significantly reduced [92] and the ground state fidelity is expected to rapidly decrease upon smoothly moving the Majoranas over a disordered medium.
In Fig. 8 we show the performance of the JMJ protocol in regime I for various different disorder strengths compared to a linear benchmark protocol. Regimes II, III and IV are detailed in App. H. From the averaged instantaneous infidelity I(t) in Fig. 8 it can be seen that the infidelity associated with the jumps is robust to the presence of disorder but there is a relatively strong fidelity loss associated with the linear motion of the move part. This is in line with the expectations given the lower critical velocity and longer coherence length induced by disorder. While the final infidelity value I(T ) of the JMJ protocol remains reasonably high in regime I, for longer time/distance regimes (e.g. deep in regimes II/IV) I(T ) worsens significantly, which suggests that additional protocol optimization in the presence of disorder may be required to achieve comparable infidelities.
The protocol optimization in the presence of disorder can be achieved through the NES method since the computational complexity grows linearly with the number of disorder realizations, all of which can be simulated in parallel. The DP method can also be applied because the disorder averaging is a differentiable operation, i.e., DP can access the disorder averaged infidelity ∇ θ I(T ) with respect to the control parameters. For example [145] and [146] consider DP for stochastic optimization tasks.

VII. CONCLUSION AND FURTHER WORK
We have applied two state-of-the-art machine learning techniques, namely differentiable programming (DP) and natural evolution strategies (NES), to the problem of manipulating Majorana bound-states in a topological superconductor. For DP we have shown that the entire dynamical evolution of any free-fermion system can be functionally differentiated, allowing for efficient optimization protocols. This in turn provides an ability to tackle computationally harder problems and allows the dynamical optimization to be integrated seamlessly with both direct and neural-network parameterizations of quantum control protocols.
In addition to this we have shown how the Majorana control problem itself can be naturally formulated as a game. This allows the application of reinforcement learning approaches and NES to zero-mode manipulation. The key advantage here is both speed and flexibility compared to, e.g., the standard MC methods such as simulated annealing. Beyond the advantages observed in our numerical experiments there are a number of additional conceptual and practical ingredients which can be taken advantage of in the reinforcement learning setup including a flexible exploration step, delayed rewards, partial observations, and the ease of accounting for the stochastic nature of the reward function, all of which may be particularly relevant to experimental setups.
Note that these machine learning approaches are not specific to Majorana bound state control, and could also be applied to different dynamical many-body problems with radically different motivations and cost functions (e.g. log-likelihood [147], Fisher information [148] or the trace distance [149]).
On the theoretical level we have introduced a framework for dividing up the Majorana control problem into four distinct transport regimes. This has deepened the understanding of the problem and help to establish a connection with optimal control theory more broadly. Remarkably our numerical simulations displayed an innate awareness of this rich landscape and uncovered important hidden aspects of the underlying physics. In particular they have identified a new class of protocol that radically outperforms other known strategies (adiabatic and bang-bang) when there are both spatial and temporal constraints. We have shown that this family of protocols, and the associated classification of Majorana motion, also holds for the richer proximity-coupled semiconductor nanowire model, which is directly relevant for experiments. In principle, since the defining criteria are related to topological gap, similar protocols should also apply to related models such as the SSH chain [150] or multi-channel topological wires variants [151].
The core of this alternative strategy is a sudden jump followed by a period of constant velocity motion and then another jump. A theoretical analysis of this dynamics shows that the strategy simultaneously exploits the models stability at constant velocities below a certain threshold, and the property that small instantaneous changes in the domain wall position do not dramatically reduce the ground state fidelity. Further analysis of dressed JMJ protocols reveals that more complex initial jump sequences can, in addition to allowing a sub-critical velocity, better target the ground state of the moving frame.
The protocol resembles the parallel recent development of the Bang-Anneal-Bang protocols [97] for ground state preparation in Noisy Intermediate Scale Quantum devices. Our theoretical analysis illuminates why this type of strategy works so successfully and we are hopeful that this approach will motivate more general theoretical analysis of related protocols. In this context it would also be interesting to see the same types of protocols also emerge in more realistic materials. Our results for the semiconductor model suggest that similar protocols are indeed relevant. However, in that example, it is important to note that boundary motion does not strongly couple between bands with different spin orientations. In real materials one might expect that sudden boundary change would couple the ground state to bands at larger energies and this may restrict the range of motions that one can achieve.
Another natural direction to explore is the application of our methods to the problem of Majorana braiding, which is the ultimate goal behind the study and control of zero-modes in the context of topological quantum computing. It remains an interesting open question whether aspects of our protocols remain robust if one models specific devices more closely. We emphasize however that the strength of the RL setup is that one can easily encode other experimental restrictions on the control by including them in the reward or action spaces [152]. As such the application of these methods can be readily extended to other more complicated models, such as models including the effect of interactions [153], different simulation techniques such as the time-dependent density-matrix renormalization group [154], and to measurement-based Majorana devices [107].
In a broader sense we believe our work shows that machine learning for quantum control can be done at a scale, speed, and precision that is relevant to modern experimental devices. As such we believe that it will be used to overcome obstacles to realizing large-scale quantum computation, quantum communication networks, quantum thermal machines, analogue quantum simulations, and control of other quantum many-body dynamical systems more broadly.

Dispersion and Majorana Mode in the Moving Frame
To find the critical velocity we start from the periodic (N+1=1) Kitaev chain Hamiltonian (Eq. 1 with V (x, t) = 0) in the continuum which reads after a Fourier transformation , momentum k and σ the Pauli spin matrices. Solving for the energy eigenvalues gives mode The moving frame can now be obtained by a Galilean transformation in terms of the unitary rotation U (t) = e −ik t 0 v(t )dt which is the time dependent translation operator that rotates to a frame with domain wall velocity v(t). Applying this to A1 gives the moving frame Hamiltonian The moving frame has an extra term v(t)k on the diagonal which means the moving frame energy dispersion is [73]. This results in a tilted dispersion as shown in Fig. A1, when v = ∆ the tilt causes the gap to close and we have a topological phase transition. This velocity defines the critical velocity v crit = ∆.
Besides the energy dispersion, we can also look at the wave function of the Majorana zero modes in the moving frame. To find this wave function we solve for the zero-energy solutions of the moving frame Hamiltonian A2 which The difference with the Majoranas in the lab frame is that the localization length ξ = 1/∆m is dilated by a factor γ = 1/ 1 − v 2 ∆ 2 causing the Majorana modes to become spatially more extended (delocalized) in the moving frame. When v = v crit = ∆ the localization length becomes infinite indicating that the local character is lost and hence a topological phase transition occurs as shown in Fig. A1. for the velocity of the left domain wall in Eq. 2. When v max is not too large, i.e. the amplitude of the left domain wall position can be considered small compared to the total length of the wire N , we can treat this motion as a time-dependent perturbation and apply perturbation theory. In this case we write the external potential as V = V 0 + δV sin ωt in which V 0 is the static domain wall profile in Eq. 2 and δV sin ωt a time-dependent fluctuation on top of it. From Fermi's Golden Rule for a harmonic perturbation [155] we can find the infidelity rate to be lim T →∞ in which |ψ i are the eigenstates of the Hamiltonian with the static domain wall profile V 0 which have energy E i . From this equation we can read off that there are resonances in the infidelity whenever ω res = ±(E i − E 0 ), for the first excited state ω res = E i − E 0 = ∆k F is equal to the gap in the system and we arrive at the resonance time T res = 2π ωres = 2π ∆k F . The resonances for the higher energy bands are suppressed with the transition amplitude | ψ i | δV |ψ 0 | 2 . When ω → ∞ there are no resonances and the infidelity becomes zero; the intuition for this is that the static system H 0 does not notice the harmonic perturbation because it oscillates at a much higher frequency. Similarly, when ω < ω res there are also no resonances because the frequencies are smaller than the gap which corresponds to adiabatic Majorana motion.
In the Majorana motion problem in the main text we are however looking at moving the left domain wall forward from x A to x B . As shown in Fig. A2 we see that the same resonance time scale emerges for forward motion with For ω → ∞ we get the same rate as constant motion with vmax 2 which can be explained for low velocities from adiabaticity. To find the fidelity we first write the Hamiltonian Eq. 1 in Bogolyubov-de-Gennes (single-particle) form [79,156] in which the columns of W (t) = u(t) v(t) * v(t) u(t) * are the eigenmodes (quasi-particle modes) of the BdG Hamiltonian and E(t) is the diagonal matrix of the mode energies [E] ii (t) = i (t). To compute the final many-body overlap at time t = T we then use the Onishi Formula which relates the BdG quasi-particle picture to the many-body picture. In this u(T ) and v(T ) can be obtained from W (T ) = T e −i T 0 H bdg (t )dt W (0) and u B and v B from the diagonalization of the instantaneous BdG Hamiltonian with the domain wall at position x B . F(T ) can now be used to compute the infidelity I(T ) in Eq. 3 in the main text.
From a programming perspective, a code to compute the infidelity from Eq. B3 and Eq. 3 consists of a series of elementary operations (primitives) f i that map the input control x L (t) with t ∈ [0, T ] to the output I(T ) = f n • f n−1 • · · · • f 1 (x L (t)). The required primitives for this algorithm are matrix multiplications, diagonalization of a hermitian matrix B2 and taking the determinant B3. Each of these operations has a known forward and reverse mode derivative [157] that can be recursively assembled in the chain rule to find the derivative of the infidelity with respect to the control. An automatic differentiation package, like the JAX library [124], computes all the derivatives of the primitives in a code automatically and assembles them in the chain rule to evaluate the total gradient of a computer program. We applied this method to the code to compute the infidelity and used the gradient for the optimizations described in the main text. In Fig. A3 a) we show an example for the gradient obtained in this way for a linear motion protocol in regime I which we checked with finite difference to make sure it was working correctly. We observe that for this protocol the gradient is the biggest at the boundary which might explain the jumping behaviour at the beginning and end of the protocols we found with the ML optimizations (Fig. 2). A3. a) Comparison between the gradient of the infidelity I with respect to the control xL obtained with automatic differentiation (AD) and finite difference. The gradients dI dx L were evaluated for a linear protocol xL(t) = vavgt as a function of time t in regime I. The AD gradient matches up with the finite difference gradient and we note that at the beginning and end of the protocol the magnitude of the gradient is the largest. b) and c) Optimized parameterized bang-bang protocols in regimes II (b) and IV (c) with the simulated annealing method. The position profiles xL(t) are plotted in the main panels and the velocity profiles vL(t) in the insets. The bang-bang protocols seem to approximate the JMJ protocol in regime II and super-adiabatic protocol in regime IV. The parameters for these simulations were the same as in Fig. 2 in the main text.
To compute the matrix exponential required for the computation of W (T ) = T e −i T 0 H bdg (t )dt W (0) we discretize the continuous time variable t in individual time steps of size dt and apply the Trotter-Suzuki expansion to write This means that while the time complexity of the back-propagation algorithm is similar to the forward evaluation, the memory complexity grows linearly with the number of discrete time steps T /dt. A method to circumvent this growth in memory complexity is the use of adjoint sensitivity methods [158] which avoid back propagating through the time evolution differential equation (Schrödinger equation) by solving a second differential equation backwards in time. In practise the memory complexity did not turn out to be a bottleneck in our simulations however and it was sufficient to use the backpropagation algorithm as described above. For extending our methods to include disorder and interactions or for simulating very long times T ∼ O(10 3 ) the adjoint method might be a good option to explore and we refer the interested reader to [159] for comparative studies between the adjoint and backprogagation algorithms.
For a standard bang-bang protocol the velocity of the domain wall v L is at all times only allowed to take either the value v L = v min or the value v L = v max , a switch between these two discrete velocities is called a bang. To search for the optimal bang-bang protocols we fix the minimum time between two consecutive bangs to be ∆t = 0.1 and perform a simulated annealing search similar to the one described in the main text. In this SA search we impose an additional constraint (compared to the free search in the main text) in which each update steps consist of changing v max to v min and vice versa to retain the bang-bang character of the protocols. We note that a version of this method was used in [131] to search for optimal bang-bang protocols in a spin system. We used v min = 0 similar to the bang-bang protocols in [74] and we scanned different values of the maximum velocity 0 < v max < 8v avg . This means that for some of the bang-bang protocols in regimes I and II v max ≥ v crit which was not the case for the protocols in [74], however we observed that increasing v max leads to lower infidelities even when making v max bigger than the critical velocity.
The ramp-up-down protocols are a family of super-adiabatic protocols in which the velocity is slowly built up from zero to some maximum velocity v max and then ramped down again to zero as given by in which ω is the parameter that determines how quickly the velocity is accelerated to v max . To find the best rampup-down protocol in each regime we scanned over a large range of values of the free parameter ω (v max is fixed by the average velocity constraint).
In Table A1 we summarize the best results (lowest infidelity) of these protocols. We compared these results to protocols obtained with the ML methods as described in the main text and see that in all regimes the ML protocols outperform the standard benchmark protocols significantly. The optimal bang-bang protocols obtained with the SA method in regimes II and IV are plotted in Fig. A3. It can be seen that the protocols start to approximate the optimal JMJ and super-adiabatic protocols obtained with the other methods. To be able to fully approximate the optimal JMJ protocols with a bang-bang protocol one needs to test even higher maximum velocities than we did here and also consider a negative minimum velocity v min < 0. We also tested the NES and DP optimization algorithms for different parameterizations of the control of the domain wall on a smaller system size N = 50 before fixing the specific parameterizations as used and described in the main text for the big system N = 110. We compared parameterizing the control by the position x L (t), the velocity v L (t) or a neural network x L (t) = NN θ (t) and show the results in Table A2. The optimization with NES gives the lowest infidelity when we parameterize by position in regimes I and II and by velocity in regimes III and IV. For the DP optimization the best infidelity optimizes a neural network in regimes I,III and IV and optimizes over the position in regime II.

Appendix E: Discussion on other methods for quantum control
We now briefly discuss the NES and DP methods in relation to traditional quantum control algorithms and RL methods. First we note that DP provides a general framework to automatically evaluate gradients of computer programs which can be integrated into traditional quantum control techniques such as the Krotov method [160], Gradient-based-Pulse-Engineering [161], and other gradient-based control techniques. In this context, a key advantage of DP is its ability to ease the implementation of gradient-based techniques, where, e.g., the incorporation of physical constraints, such as the restrictions in the amplitude of the controls, requires only adding the constraints to the main objective function. This avoids complex and potentially error-prone analytical calculations. In addition, the  TABLE A2. Comparison of the lowest infidelity values between optimizations of AD and NES with respect to different parameterizations of the control (position xL(t), velocity vL(t) and neural net xL(t) = NN θ (t)). Based on these testing values we determined which parameterizations to use for the optimizations in the main text. These simulations were run for a system size of N = 50 and all the other parameters are the same as the parameters in Fig. 2 in the main text.
implementation of control protocols to widely different physical scenarios (e.g. free fermion or boson Hamiltonians or alternative simulation strategies) typically requires only the coding of the forward propagation of the simulation with minimal changes to the optimization subroutines. In addition, quantum optimal control algorithms based on DP can easily harness the acceleration afforded by graphics processing units (GPUs), which may speedup quantum control calculations by more than an order of magnitude [120]. This is most easily achieved using freely available open-source packages such as JAX [124]. Compared to RL and other gradient-free methods, the DP method used here remains a powerful method as long as the physical model is differentiable, as is the case in our work since the BdG algorithm B used to compute the many-body infidelity is fully differentiable. DP and all gradient-based methods provide a strong optimization signal and typically converge faster than RL techniques, which often require high sample complexity for good convergence. In addition, the use of gradients avoids the reward function design problem in RL [162,163]. This allows us to efficiently extend DP to the many particle Hamiltonian required for the Majorana transport.
On the other hand, NES exhibits multiple advantageous features. First, as a gradient-free black box optimization method, NES is the most flexible among the methods implemented in our work since it can be applied to controls that are discrete and continuous. Second, it can optimize non-differentiable control problems, as well as problems where the gradient estimation is numerically unstable in a way that makes the application of DP challenging. NES is also easily parallelizable across multiple processors for a fast collection of sufficient samples for optimization. Lastly, compared to RL, we mention that NES can be directly applied to any desired objective function. This avoids the problem of the design of reward function commonly encountered in RL algorithms, which is necessary to avoid sparse learning signals that may hamper the efficiency of RL methods in quantum control.
We also briefly mention that RL techniques such as the Watkins Q-learning algorithm [70] and policy gradient, tabular Q-learning, and deep Q-learning have recently been shown to be capable of achieving high performance in other quantum control setups [66]. Ref. [66] provides a detailed comparison between tabular Q-learning, deep Qlearning, and policy gradient [70], and gradient descent and Krotov algorithms applied to the problem of preparing a target quantum state. Gleaned from their numerical experiments, the authors in Ref. [66] conclude that the deep Q-learning and policy gradient algorithms tend to outperform other RL and traditional control algorithms when the problem allows for discrete values of the control parameters, and when the problem is scaled up to large system sizes.
Finally, we comment on well-established adiabatic shortcut methods [12], in particular counter diabatic driving (CDD) and Lewis-Riesenfeld invariants (LRI), in the context of the Majorana game problem. We note that an attempt to implement these in our setup encounters an array of problems. In CDD for example, the control protocols required a level of Hamiltonian control far beyond what would be experimentally feasible, where a full time-dependent and spatial control of a the problem Hamiltonian is required. Lastly, LRI requires finding a suitable invariant, which to the best our knowledge, is not known for the Majorana wire.
Appendix F: Semiconducting nanowire spin-orbit coupled to s-wave superconductor The Kiteav chain model for topological p-wave superconductivity [25] used for the simulations in the main text can be realized effectively in a semiconducting nanowire proximity coupled to a conventional s-wave superconductor in a Zeeman field [15][16][17]. In this appendix we will first introduce the proximity coupled model and discuss in which limits the Kitaev chain toy model can be obtained. Then we show how the critical velocity v crit and resonance frequency can be obtained such that we can define the same Majorana motion regimes as in the main text. Finally we show numerical optimization results obtained with AD and NES in each of those regimes for two different sets of parameters of the proximity coupled model (with one set outside the Kitaev chain limit).

Hamiltonian and Energy dispersion
The Hamiltonian for the proximity coupled semiconducting nanowire (NW) for periodic boundary conditions, a homogeneous potential V (x) = V and chemical potential µ(x) = µ in the continuum momentum representation is given by in which σ i acts on the spin-degrees of freedom and τ i on the particle-hole space. In here we have an external magnetic field B in the x-direction, spin-orbit coupling with strength α in the y-direction and a standard s-wave superconducting term with gap ∆. Note the main difference with the Kiteav chain in Eq. A1 is the spin-degrees of freedom s = {↑↓} which result in Zeeman splitting, spin-orbit interaction and s-wave superconductivity instead of spin-less p-wave superconductivity. The energy dispersion can be found by solving the characteristic equation for H BdG Eq. F2 which leads to In this four-band model various parameter choices lead to a gapped dispersion with in some phases Majorana zero modes, see for a complete discussion e.g. [17]. Moreover, in two specific limits (large Zeeman field B α, ∆ or for large spin-orbit coupling α B, ∆) this proximity coupled NW model can be projected onto the lower bands (reduces to two similar blocks) resulting in the Kitaev chain model with p-wave superconducting gap ∆ = α∆ B (large B) or ∆ = ∆ mα (large α) respectively [73,151].

Critical Velocity and Resonance Frequency
To bring the NW in Eq. F2 into the moving frame we apply the same techniques as for the Kitaev chain discussed earlier in App. A, i.e. we apply the time-dependent translation operator U (t) = e −ik t 0 v(t )dt to rotate the Hamiltonian Eq. F2, which results in the moving frame Hamiltonian Like for the Kitaev chain this transformation gives an additional contribution vk to the energy dispersion that tilts it as shown in figure A4. In the regimes when the system is originally gapped the gap closes due to this tilt when v ≡ v crit = Egap k F . For most parameters finding the general expression for the energy gap and critical velocity is too complicated and we need to obtain it numerically. However things simplify in the Kitaev chain limit for B α, ∆ which gives v crit ≈ α∆ B and when α B, ∆ we obtain v crit ≈ ∆ mα [73]. The resonance frequency in this model is like for the Kitaev chain defined as the being equal to the gap ω res = E gap . With these associations for the critical velocity and resonance frequency the definition of the Majorana motion regimes originally in terms of parameters of the Kitaev chain can be extended to include the proximity coupled model as well; regime I is defined to be for v avg > v crit , regime II close but below the critical velocity, regime III low velocity and short time-scales with respect to the energy gap and finally the (super)-adiabatic regime IV for long times and low velocities with respect to the gap.

Optimization results
By putting the NW model Eq. F2 on a lattice, opening up the boundary condition and imposing the smooth potential profile in Eq. chain in the main text. In Fig. A4 b) we show an example of the excitation energies for a set of parameters outside the Kitaev chain limit in which we have a Majorana zero mode and also Andreev bound states. We then optimize the transport of the Majoranas in this NW model with the same AD and NES techniques as used before. We show the results for these simulations for the same parameters as in Fig. A4 from Differentiable Programming and Natural Evolutionary Strategies in Fig. A5 (i-p) and from JMJ protocols in Fig. A5 (q-s). For completeness in Fig. A5 (a-h) we include results obtained by choosing parameters that approximate the Kitaev chain limit (high B−field).
It can be seen that for these two sets of parameters we obtained qualitatively similar strategies for Majorana transport in the proximity-coupled NW as for the simpler Kitaev chain model as shown in Fig. 2 in the main text. That is, in regions I-II-III we get the JMJ strategy with a pulse-like motion at the beginning and end of the protocol and an on average constant motion with a velocity v crit in the middle of the protocol. For the parameters outside the Kitaev limit, the initial and final dressed jumps are comprised of three parts: a forward jump, a backward jump and another forward jump. For these parameters we observe a much smoother protocol in regime II compared to the Kitaev chain. In regime IV for both sets of parameters we again recover the expected smooth adiabatic protocols.
To show that the JMJ-strategy (together with its analysis discussed in the main text) is also a robust strategy for the proximity coupled NW model we scanned again over the different parameters ∆x forward and ∆x back outside the Kitaev limit. The optimal JMJ protocols obtained in this way together with the infidelity values are shown in Fig. A5 (q-s). It can be seen that the infidelity values for the dressed JMJ protocol are competitive to the protocols obtained with DP and NES. Compared to the Kitaev chain results (panel (d) in Fig. 1) it can be seen that for this proximity coupled NW model the dressing of the jumps is slightly less pronounced.

Appendix G: Parameters and interpolation of the optimized JMJ model strategy
In this appendix we give the parameters of the optimized JMJ strategies in regimes I,II, and III as shown in Fig. 1 d) in the main text and verify how close these strategies are to the protocols obtained with the ML methods. In the main text we observed that dressed JMJ protocols with the lowest infidelities have a linear relation between ∆x forward and ∆x backward as Fig. 3 shows. To attain the best JMJ protocols in regime I-III, we scan through ∆x forward and ∆x back with respect to the linear relation in each regime and pick the one which gives the overall lowest infidelity. The parameters of the resulting JMJ strategies are provided in Table A3 and the strategies are shown in Fig. 1.
We linearly interpolate these best JMJ model strategies to the machine learning protocols in the main paper ( Fig.  2) to see whether there are additional protocols that are better in terms of infidelity. The results for this in regime III with the DP strategy are shown in Fig. A6 from which we can see that the DP strategy is slightly better than the JMJ (due to more free parameters) and that there are no other local minima in between them. For the other regimes  TABLE A3. Best dressed JMJ model strategies parameters and their corresponding infidelities obtained from scans over ∆x forward , ∆x back and ∆t back . The corresponding profiles are shown in Fig. 1 in the main paper.
and interpolations from the NES strategy we obtained similar results. Overview of simulation results of running the optimal protocols obtained in the clean non-interacting system in a system with interactions (top row) or disorder (bottom row) in all four regimes. In all regimes the infidelity increases with disorder strength λ and also with interaction strength u. The parameters for these simulations were the same as in Fig. 2 in the main text and the disorder averaging was done over 500 disorder realizations.

Appendix I: Jump infidelity costs with system parameters
In this appendix we examine the infidelity cost of a single jump as a function of the important physical lengths scales λ F = 2π/k F and ξ = 1/(m∆). Our numerical test suggest a near Gaussian drop off in the infidelity as a function of jump distance: O δ ∼ exp −δ 2 /s 2 . Generally speaking, the larger the value of s, the larger one can jump.
In figure A8 we plot the value of s that we obtain from a numerical fitting the O δ drop-off. We see that the value of the superconducting parameter ∆ (and hence ξ) do play a role: the value of s tends to go down, but not dramatically so, as ∆ gets smaller (and ξ gets larger). Perhaps more importantly we see the that s parameter is linearly related to λ F . This means the jump size can in principle get bigger if one tunes the chemical potential towards the bottom of the single particle valence band.
FIG. A8. In the p-wave model the O δ ∼ exp −δ 2 /s 2 have an interesting relationship to the two key length scales. The fit parameter s is reduced by a factor of about 1/2 when the coherence length ξ = 1/(m∆) is made very large. On the other hand its value tends to increase linearly in proportion to λF = 2π/kF . The topological gap Egap ∼ ∆kF = 2π/(mξλF ) is however inversely proportional to both length scales.