A Parallel General Purpose Multi-Objective Optimization Framework, with Application to Beam Dynamics

Particle accelerators are invaluable tools for research in the basic and applied sciences, in fields such as materials science, chemistry, the biosciences, particle physics, nuclear physics and medicine. The design, commissioning, and operation of accelerator facilities is a non-trivial task, due to the large number of control parameters and the complex interplay of several conflicting design goals. We propose to tackle this problem by means of multi-objective optimization algorithms which also facilitate a parallel deployment. In order to compute solutions in a meaningful time frame a fast and scalable software framework is required. In this paper, we present the implementation of such a general-purpose framework for simulation-based multi-objective optimization methods that allows the automatic investigation of optimal sets of machine parameters. The implementation is based on a master/slave paradigm, employing several masters that govern a set of slaves executing simulations and performing optimization tasks. Using evolutionary algorithms as the optimizer and OPAL as the forward solver, validation experiments and results of multi-objective optimization problems in the domain of beam dynamics are presented. The high charge beam line at the Argonne Wakefield Accelerator Facility was used as the beam dynamics model. The 3D beam size, transverse momentum, and energy spread were optimized.


INTRODUCTION.
Particle accelerators play a significant role in many aspects of science and technology.Fields, such as material science, chemistry, the biosciences, particle physics, nuclear physics and medicine depend on reliable and effective particle accelerators, both as research but as well as practical tools.Achieving the required performance in the design, commissioning, and operation of accelerator facilities is a complex and versatile problem.Today, tuning machine parameters, e.g., bunch charge, emission time and various parameters of beamline elements, is most commonly done manually by running simulation codes to scan the parameter space.This approach is tedious, time consuming and can be error prone.In order to be able to reliably identify optimal configurations of accelerators we propose to solve large multi-objective design optimization problems to automate the investigation for an optimal set of tuning parameters.Observe that multiple and conflicting optimality criteria call for a multi-objective approach.
We developed a modular multi-objective software framework (see Fig. 1.1) where the core functionality is decoupled from the "forward solver" and the optimizer.To that end, we use a master/slave mechanism where a master process governs a set of slave processes given some computational tasks to complete.This separation allows to easily interchange optimizer algorithms, forward solvers and optimization problems.A "pilot" coordinates all efforts between the optimization algorithm and the forward solver.This forms a robust and general framework for massively parallel multi-objective optimization.Currently the framework offers one concrete optimization algorithm, an evolutionary algorithm employing a NSGAII selector [1].Normally, simulation based approaches are plagued by the trade-of between level of detail and time to solution.We address this problem by using forward solvers with different time and resolution complexity.

Optimizer
The framework discussed here, incorporates the following three contributions: 1. Implementation of a scalable optimization algorithm capable of approximating Pareto fronts in high dimensional spaces, 2. design and implementation of a modular framework that is simple to use and deploy on large scale computation resources, and 3. demonstrating the usefulness of the proposed framework on a real world application in the domain of particle accelerators.Here, we will focus on a real world application arising from PSI's SwissFEL [24] activities, which have an immediate and a long-range impact, by optimizing important parameters of PSI's 250 MeV injector test facility (and later for the full PSI SwissFEL).
The next section introduces the notation of multi-objective optimization theory and describes the first implemented optimizer.In Section 3 we discuss the implementation of the framework.We introduce the employed forward-solver in Section 4. A validation and a proof of concept application of a beam dynamics problem is discussed in Section 5.

MULTI-OBJECTIVE OPTIMIZATION.
Optimization problems deal with finding one or more feasible solutions corresponding to extreme values of objectives.If more than one objective is present in the optimization problem we call this a multi-objective optimization problem (MOOP).A MOOP is defined as s.t.g j (x) ≥ 0, j = 0, . . ., J (2.2) where f denotes the objectives (2.1), g the constraints (2.2) and x the design variables (2.3).Two competing objectives can not be optimal at the same time.Red dots represent Pareto optimal points, while the green square x 4 is dominated (exhibits a worse price per performance ratio than e.g.x * 2 ) by all points on the blue curve (Pareto front).
Often, we encounter conflicting objectives complicating the concept of optimality.To illustrate this, let us consider the problem of buying a car.Naturally, we try to bargain the lowest price for the best performance, e.g.maximal horsepower or minimal fuel consumption.This can be formulated as MOOP (2.4In general, it is not possible to get the maximal performance for the lowest price and a trade-off decision between performance and price has to be reached (see Figure 2.1).Since not every choice is equally profitable for the buyer (for example, car x 4 costs as much as x * 2 but offers less performance), we pick trade-offs (red points) that are essentially "equally optimal" in both conflicting objectives, meaning, we cannot improve one point without hurting at least one other solution.This is known as Pareto optimality.The set of Pareto optimal points (blue curve) forms the Pareto front or surface.All points on this surface are considered to be Pareto optimal.
Once the shape of the Pareto front has been determined the buyer can specify preference, balancing the features by observing the effect on the optimality criteria, converging to the preferred solution.This is called a posteriori preference specification since we select a solution after all possible trade-offs have been presented to us.An alternative is to specify preference a priori, e.g., by weighting (specifying preference before solving the problem) and combining all objectives into one and applying a single-objective method to solve the problem (yielding only one solution).In many situations preference is not known a priori and an a posteriori preference specification helps conveying a deeper understanding of the solution space.The Pareto front can be explored and the impact of a trade-off decision then becomes visible.
Sampling Pareto fronts is far from trivial.A number of approaches have been proposed, e.g.evolutionary algorithms [10], simulated annealing [21], swarm methods [20], and many more [9,12,19,26].In the next section we briefly introduce the theory of evolutionary algorithms used in the present work.
2.1.Evolutionary Algorithms.Evolutionary algorithms are loosely based on nature's evolutionary principles to guide a population of individuals towards an improved solution by honoring the "survival of the fittest" practice.This "simulated" evolutionary process preserves entropy (or diversity in biological terms) by applying genetic operators, such as mutation and crossover, to remix the fittest individuals in a population.Maintaining diversity is a crucial feature for the success of all evolutionary algorithms.
In general, a generic evolutionary algorithm consists of the following components: • Genes: traits defining an individual, • Fitness: a mapping from genes to a fitness value for each individual, • Selector : selecting the k fittest individuals of a population based on some sort of ordering, • Variator : recombination (mutations and crossover) operators for offspring generation.Applied to multi-objective optimization problems, genes correspond to design variables.The fitness of an individual is loosely related to the value of the objective functions for the corresponding genes.Figure 2.2 schematically depicts the connection of the components introduced above.The process starts with an initially random population of individuals, each individual with a unique set of genes and corresponding fitness, representing one location in the search space.In a next step the population is processed by the selector determining the k fittest individuals according to their fitness values.While the k fittest individuals are passed to the variator, the remaining n − k individuals are eliminated from the population.The Variator mates and recombines the k fittest individuals to generate new offspring.After evaluating the fitness of all the freshly born individuals a generation cycle has completed and the process can start anew.
Since there already exist plenty of implementations of evolutionary algorithms, we decided to incorporate the PISA library [1] into our framework.One of the advantages of PISA is that it separates variator from selector, rendering the library expandable and configurable.Implementing a variator was enough to use PISA in our framework and retain access to all available PISA selectors.As shown in Figure 2.2, the selector is in charge of ordering a set of d-dimensional vectors and selecting the k fittest individuals currently in the population.The performance of a selector depends on the number of objectives and the surface of the search space.So far, we used an NSGA-II selector [11] exhibiting satisfactory convergence performance.
The task of the variator is to generate offspring and ensure diversity in the population.The variator can start generating offspring once the fitness of every individual of the population has been evaluated.This explicit synchronization point defines an obvious bottleneck for parallel implementations of evolutionary algorithms.In the worst case some MPI threads are taking a long time to compute the fitness of the last individual in the pool of individuals to evaluate.During this time all other resources are idle and wait for the result of this one individual in order to continue to generate and evaluate offspring.To counteract this effect we call the selector already when two individuals have finished evaluating their fitness, lifting the boundaries between generations and evaluating the performance of individuals.New offspring will be generated and MPI threads can immediately go back to work on the next fitness evaluation.By calling the selector more frequently (already after two offspring individuals have been evaluated) results in better populations since bad solutions are rejected faster.On the other hand, calling the selector more often is computationally more expensive.
Our variator implementation uses the master/slave architecture, presented in the next section, to run as many function evaluations as possible in parallel.Additionally, various crossover and mutation policies are available for tuning the algorithm to the optimization problem.
3. THE FRAMEWORK.Simulation based multi-objective optimization problems are omnipresent in research and the industry.The simulation and optimization problems arising in such problems are in general very big and computationally demanding.This motivated us to design a massively parallel general purpose framework.
The key traits of such a design can be summarized as: • support any multi-objective optimization method, • support any function evaluator: simulation code or measurements, • offer a general description/specification of objectives, constraints and design variables, • run efficiently in parallel on current large-scale high-end clusters and supercomputers.
3.1.Related Work.Several similar frameworks, e.g.[13,16,22,23], have been proposed.Commonly these frameworks are tightly coupled to an optimization algorithm, e.g.only providing evolutionary algorithms as optimizers.Users can merely specify optimization problems, but cannot change the optimization algorithm.Our framework follows a more general approach, providing a user-friendly way to introduce new or choose from existing built-in multi-objective optimization algorithms.Tailoring the optimization algorithm to the optimization problem at hand is an important feature due to the many different characteristics of optimization problems that should be handled by such a general framework.As an example, we show how Pisa [1], an existing evolutionary algorithm library, was integrated with ease.Similarly, other multi-objective algorithms could be incorporated and used to solve optimization problems.
The framework presented in [22] resembles our implementation the most, aside from their tight coupling with an evolutionary algorithm optimization strategy.The authors propose a plug-in based framework employing an island parallelization model, where multiple populations are evaluated concurrently and independently up to a point where some number of individuals of the population are exchanged.This is especially useful to prevent the search algorithm to get stuck in a local minimum.A set of default plug-ins for genetic operators, selectors and other components of the algorithms are provided by their framework.User-based plug-ins can be incorporated into the framework by implementing a simple set of functions.
Additionally, as with simulation based multi-objective optimization, we can exploit the fact that both the optimizer and simulation part of the process use a certain amount of resources.The ratio of work between optimizer and simulation costs can be reflected in the ratio of number of processors assigned to each task.This not only provides users with great flexibility in using any simulation or optimizer, but renders influencing the role assignment easy as well.

Components.
The basic assumption in simulation-based optimization is that we need to call an expensive simulation software component present in the constraints or objectives.We divide the framework in three exchangeable components, as shown in Figure 3.1, to encapsulate the major behavioral patterns of the framework.
The Pilot component acts as a bridge between the optimizer and forward solvers, providing the necessary functionality to handle passing requests and results between the Optimizer and the Simulation modules.
We implemented the framework in C++, utilizing features like template parameters to specify the composition of the framework (shown in Listing 1).The framework provides "default" implementations that can be controlled via command line options.Due to its modular design, all components can be completely customized.
Every available MPI thread will take up one of the three available roles (see Figure 1.1): one thread acts as Pilot, the remaining threads are divided amongst Worker and Optimizer roles.Both, the Worker and the Optimizer can consist of multiple MPI threads to exploit parallelism.As shown in Figure 3.1 the Pilot is used to coordinate all "information requests" between the Optimizer and the Worker.An information request is a job that consists of a set of design variables (e.g. the genes of an individual) and a type of information it requests (e.g.function evaluation or derivative).The Pilot keeps checking for idle Worker and assigns jobs in the queue to any free Worker.Once the Worker has computed and evaluated the request its results are returned to the Optimizer that originally requested the information.
After a thread gets appointed a role it starts a polling loop asynchronously checking for appropriate incoming requests.To that end a Poller interface helper class has been introduced.The Poller interface consists of an infinite loop that checks periodically for new MPI messages.Upon reception a new message is immediately forwarded to the appropriate handler, the onMessage() method.The method is called with the MPI Status of the received message and a size t value specifying different values depending on the value of the MPI Tag.The Poller interface allows the implementation of special methods (denoted hooks) determining the behavior of the polling process, e.g. for actions that need to be taken after a message has been handled.Every Poller terminates the loop upon receiving a special MPI tag.All processors running an Optimizer component call the initialize entry point after role assignment in the Pilot.The implementation of initialize must set up and start the poller and the optimization code.Since an optimizer derives from the Poller interface, predefined hooks can be used to determine the polling procedure.Hooks can be implemented as empty methods, but the onMessage implementation should reflect the optimization part of the protocol for handling events from the Pilot.A special set of communicator groups serves as communication channels to the Pilot and its job queue and if existing to threads supporting the Optimizer component.

3.4.
Implementing a Forward Solver.In most cases Simulation implementations are simple wrappers to run an existing "external" simulation code using a set of design variables as input.As for the Optimizer component there exists a base class, labeled Simulation as common basis for all Simulation implementations.In addition, this component also inherits from the Worker class, already implementing the polling protocol for default worker types.As shown in the API in Listing 3 the Worker class expects an implementation to provide implementations for those three methods.
Code Listing 3: Simulation API virtual void run () = 0; virtual void collec tResults () = 0; virtual r eqV a r C o n t a i n e r _ t getResults () = 0; First, upon receiving a new job, the Worker will call the run method on the Simulation implementation.This expects the Simulation implementation to run the simulation in a blocking fashion, meaning the method call blocks and does not return until the simulation has terminated.Subsequently, the Worker calls collectResults, where the Simulation prepares the result data, e.g., parsing output files, and stores the requested information in a reqVarContainer t data structure.Finally, the results obtained with getResults are sent to the Pilot.As before, the serialized data is exchanged using MPI point-to-point communication using a specific set of communicators.
3.5.Specifying the Optimization Problem.We aimed at an easy and expressive way for users to specify multi-objective optimization problems.Following the principle of keeping metadata (optimization and simulation input data) together, we decided to embed the optimization problem specification in the simulation input file by prefixing it with special characters, e.g., as annotations prefixed with a special character.In some cases it might not be possible to annotate the simulation input file.By providing an extra input file parser, optimization problems can be read from stand-alone files.
To allow arbitrary constraints and objective expressions, such as name: OBJECTIVE, EXPR="5 * average(42.0,"measurement.dat")+ ENERGY"; we implemented an expression parser using Boost Spirit1 .In addition to the parser, we need an evaluator able to evaluate an expression, given a parse tree and variable assignments to an actual value.Expressions arising in multi-objective optimization problems usually evaluate to booleans or floating point values.The parse tree, also denoted abstract syntax tree (AST), is constructed recursively while an expression is parsed.Upon evaluation, all unknown variables are replaced with values, either obtained from simulation results or provided by other subtrees in the AST.In this stage, the AST can be evaluated bottom-up and the desired result is returned after processing the root of the tree.To improve the expressive power of objectives and constraints we introduced a simple mechanism to define and call custom functions in expressions.Using simple functors, e.g., as the one shown in Listing 4 to compute an average over a set of data points, enriches expressions with custom functions.Custom function implementations overload the () parenthesis operator.The function arguments specified in the corresponding expression are stored in a std::vector of Boost variants2 that can be booleans, strings or floating point values.double sum = 0.0; for ( int i = 0; i < limit ; i ++) sum += g et Da t aF ro mF i le ( filename , i ); return sum / limit ; } };

Code Listing 4: Simple Average Functor
All custom functions are registered with expression objects.This is necessary to ensure that expressions know how they can resolve function calls in their AST.As shown in Listing 5 this is done by creating a collection of Boost functions3 corresponding to the available custom functions in expressions and passing this to the Pilot.
Code Listing 5: Creating function pointer for registering functor f u n c t i o n D i c t i o n a r y _ t funcs ; client :: function :: type ff ; ff = average (); funcs .insert ( std :: pair < std :: string , client :: function :: type > ( " my_ av er a ge _n am e " , ff )); A set of default operators, corresponding to a mapping to C math functions, is included in the dictionary by default.This enables an out of source description of optimization problem containing only simple math primitives.
3.6.Parallelization.The parallelization is defined by a mapping of the roles introduced above to available cores.Command-line options allow the user to steer the number of processors used in worker and optimizer groups.Here, we mainly use the command-line options to steer the number of processors running a forward solver.
The parallelization will be explained and benchmarked in detail in a forthcoming publication.One major disadvantage of the master/slave implementation model is the fast saturation of the network links surrounding the master node.In [5] authors observe an exponential increase in hot-spot latency with increasing number of workers that are attached to one master process.Clearly, the limiting factor is the number of outgoing links of a node in the network topology and already for a few workers the links surrounding a master process are subject to congestions.This effect is amplified further by large message sizes.
To that end we implemented a solution propagation based on rumor networks (see e.g.[4,7]) using only one-sided communication.This limits the number of messages sent over the already heavily used links surrounding the master node and at the same time helps to prevent the use of global communication.Using information about the interconnection network topology and the application communication graph the task of assigning roles helps to further improve the parallel performance.

FORWARD SOLVER.
The framework contains a wrapper implementing the API mentioned in Listing 3 for using OPAL [2] as the forward solver.OPAL provides different trackers for cyclotrons and linear accelerators with satisfactory parallel performance [3].Recently we introduced a reduced envelope model [18] into OPAL reducing time to solution by several orders of magnitude.

Optimizer Validation.
To ensure that the optimizer works correctly we solved the benchmark problem (4.1).To that end, we use a metric for comparing the quality of a Pareto front.Given a point in the Pareto set, we compute the m dimensional volume (for m objectives) of the dominated space, relative a chosen origin.We visualize this for 2 objectives in Figure 5.1.For further information and details of the implementation see [27].Figure 5.2 and the corresponding hypervolume values in Table 5.1.The reference Pareto front is clearly very well approximated.It took a total of 1100 function evaluations to perform this computation.The hypervolume of the reference solution (0.6575) for our benchmark was computed by sampling the solution provided in [17].From Table 5.1 we deduce that we achieved satisfactory convergence to the sampled reference Pareto front after 1000 (plus the additional 100 evaluations for the initial population) function evaluations.

Ferrario Matching Point.
As a verification and proof of concept we reproduce the Ferrario matching point discovered by Ferrario et al. [15], by formulating the problem as a multi-objective optimization problem.Using the low-dimensional and fast nature of their new simulation code Homdyn [14], an extensive beam dynam- ics study was conducted.
One of the results of the study presented in [15] was the discovery of a novel working point.The authors noticed that the second emittance minimum can profit from the additional emittance compensation in the accelerating traveling wave structure ensuring that the second emittance minimum occurs at a higher energy.This property is attained if the beam emittance has a maximum and the root mean square (rms) beam size has a minimum at the entrance of the first accelerating traveling wave structure.This behavior is illustrated in Figure 5. 3.
By artificially reproducing this working point as the solution of a multi-objective optimization problem given in equations (5.1) to (5.9), we demonstrate the automation of discovering optimal beam dynamics behaviors given a set of desired objectives.
The first two objectives minimize the distance from the position of the current minimum peak to the expected peak location at 3.025 m for transverse bunch size (beam waist) and emittance (see Figure 5.3).The third objective (5.3) adds a condition preferring solutions that have their emittance and rms peak locations at the same z-coordinate.Equations (5.4) and (5.5) define constraints for initial conditions for the simulation: charge, gun voltage and laser spot size.Design variables given in (5.6) to (5.9) correspond to field strengths of the first focusing magnet, its displacement, and the phase of the gun.

Convergence Study.
The envelope tracker, mentioned in the previous chapter, was chosen as the forward solver.We performed a beam convergence study in order to tune the simulation input parameters to achieve the best trade-off between simulation accuracy and time to solution.These parameters include the number of slices (NSLICE) used for the envelope-tracker simulations, simulation timestep (DT) and gun timestep (DTGUN).
Before the simulation can be executed a number of initial beam optics parameters have to be defined in an input file.Table 5.2 shows the values of these parameters for the envelope-tracker.All simulations were performed up to 12.5 m of the SwissFEL 250 MeV injector [24] beam line, with energies reaching up to 120 MeV.
The parameter that affects the performance most is the number of slices.We scanned the range from 100 to 1000 slices to determine the minimal number of slices required for stable results using various timesteps.The results (for 100, 400 and 800 slices) of this scan are shown in Figure 5.4.Using this data we settled for 400 slices -increasing the slice number only minimally improves convergence of the results, therefore using more slices is inefficient.
In a next step the influence of different time steps was examined.To that end  a series of optimization runs with 100, 400 and 800 slices and varying timestep was performed.Figure 5.5 shows the Pareto fronts for 400 slices respectively using different timesteps.As expected, increasing the number of slices while lowering the timestep produces more detailed results.

Optimization Results
. Each of the 40 points on the Pareto front, shown in Figure 5.5, represents an optimal solution, where emittance and beamsize values are compromised to achieve the best agreement with the Ferrario matching point.We selected individual 3 based on a comparison of the emittance and beamsize characteristics of all solutions and by retaining the feasibility of the beam line optics parameters.The design variables, emittance and beamsize of the selected solution are shown in Table 5.3 and Figure 5.6.With the multi-objective optimization framework we attain the same working point as reported in [24].
Using the input parameters of the selected solution, we performed a stability analysis by varying the slice number and the time step for both the gun and the beam line.Figure 5.4 shows that the exit emittance stabilizes for 400 slices and various time   steps.No difference between 800 and 400 slices is visible as their minimum maximum extension seems to be in the same range of 0.024 mm mrad.
For validation purposes we compared the results of the envelope-tracker using the analytical space charge model with the OPAL 3D macro particle tracker.The benchmark was run on the first 12.5 meters of the SwissFEL 250 MeV injector.The results for both rms beamsize and emittance are shown in Figure 5.7.A good agreement between the two codes can be observed.The difference of the larger emittance along the solenoids in case of 3D tracker that is not seen by the envelope-tracker is due to the different definition of the particle momenta (canonical vs. mechanical).Both trackers agree within acceptable limits [8].
6. CONCLUSIONS.We presented a general-purpose framework for solving multi-objective optimization problems.Its modular design simplifies the application to simulation-based optimization problems for a wide range of problems and allows to exchange the optimization algorithm.In a recent work, we successfully implemented and integrated another optimization algorithm, originally presented in [25], into our framework.The flexibility of being able to adapting both ends of the optimization process, the forward solver and the optimization algorithm simultaneously not only leads to broad applicability but it facilitates the ability to tailor the optimization strategy to the optimization problem as well.
In this paper we applied the framework to reproduce the important and well known Ferrario matching condition for the 250 MeV injector of PSI's SwissFEL.The beam size and emittance optimization was successful and emittance damping is observed in the accelerating cavity.Also, the influence of the number of slices employed by the envelope tracker with respect to convergence was investigated.We found that the optimal slice number is around 400 for the problem addressed here, producing the best trade-off between computational cost and accuracy of the solution of the simulation.Similarly, the gun and beam line time step was investigated confirming the convergence of the results.This first study shows that the framework is ready to tackle problems arising in the domain of beam dynamics.
Interestingly, the computation of a good approximation of the Pareto front is

Fig. 1 . 1 .
Fig. 1.1.Multi-objective framework: the pilot (master) solves the optimization problem specified in the input file by coordinating optimizer algorithms and workers running forward solves.

4 Fig. 2 .
Fig.2.1.Two competing objectives can not be optimal at the same time.Red dots represent Pareto optimal points, while the green square x 4 is dominated (exhibits a worse price per performance ratio than e.g.x *2 ) by all points on the blue curve (Pareto front).

Fig. 2 . 2 .
Fig. 2.2.Schematic view of interplay between selector and variator.The selector ranks all individuals in the population according to fitness and subsequently the variator uses the fittest individuals to produces new offspring.Finally, the new children are reintroduced in the population.

Fig. 3 . 1 .
Fig. 3.1.Schematic view of messages passed within the network between the three roles.The dashed cyan path describes a request (job j 1 ) sent from O i to the Pilot being handled by W j .Subsequently the result r k is returned to the requesting Optimizer (O i ).

3. 3 .
Implementing an Optimizer.All Optimizer implementations have to respect the API shown in Listing 2.

Fig. 5 . 1 .
Fig. 5.1.The hypervolume for a two-objective optimization problem corresponds to the accumulated area of all dashed rectangles spanned by all points in the Pareto set and arbitrary origin po.

Fig. 5 . 3 .
Fig. 5.3.Illustration of the Ferrario matching criteria: beam emittance attains a maximum and rms beamsize a minimum at the entrance to the first accelerating traveling wave structure.
T s.t.low pr ≤ price ≤ high pr low pe ≤ performance ≤ high pe (2.4)

Table 5 . 1
Convergence of benchmark problem with errors relative to hypervolume of sampled reference solution.

Table 5 . 2
Initial conditions for the envelope tracker.

Table 5 . 3
The design variables for individual 3.