Abstract
In this article we report on the application of a model-free reinforcement learning method to the optimization of accelerator systems. We simplify a policy gradient algorithm to accelerator control from sophisticated algorithms that have recently been demonstrated to solve complex dynamic problems. After outlining a theoretical basis for the functioning of the algorithm, we explore the small hyperparameter space to develop intuition about said parameters using a simple number-guess environment. Finally, we demonstrate the algorithm optimizing both a free-electron laser and an accelerator-based terahertz source in-situ. The algorithm is applied to different accelerator control systems and optimizes the desired signals in a few hundred steps without any domain knowledge using up to five control parameters. In addition, the algorithm shows modest tolerance to accelerator fault conditions without any special preparation for such conditions.
4 More- Received 25 July 2020
- Accepted 3 December 2020
DOI:https://doi.org/10.1103/PhysRevAccelBeams.23.122802
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society