Quantifying the stochasticity of policy parameters in reinforcement learning problems

Vahe Galstyan and David B. Saakian

Phys. Rev. E 107, 034112 – Published 8 March 2023

Abstract

The stochastic dynamics of reinforcement learning is studied using a master equation formalism. We consider two different problems— $Q$ learning for a two-agent game and the multiarmed bandit problem with policy gradient as the learning method. The master equation is constructed by introducing a probability distribution over continuous policy parameters or over both continuous policy parameters and discrete state variables (a more advanced case). We use a version of the moment closure approximation to solve for the stochastic dynamics of the models. Our method gives accurate estimates for the mean and the (co)variance of policy variables. For the case of the two-agent game, we find that the variance terms are finite at steady state and derive a system of algebraic equations for computing them directly.

Received 7 August 2022
Revised 3 January 2023
Accepted 16 February 2023

DOI:https://doi.org/10.1103/PhysRevE.107.034112

Physics Subject Headings (PhySH)

Nonequilibrium statistical mechanics

Statistical Physics & Thermodynamics

Authors & Affiliations

Vahe Galstyan ^1,2 and David B. Saakian ^2,*

¹AMOLF, Science Park 104, 1098 XG Amsterdam, Netherlands
²A.I. Alikhanyan National Science Laboratory (Yerevan Physics Institute) Foundation, 2 Alikhanian Brothers Street, Yerevan 375036, Armenia

^*saakian@yerphi.am

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand

Issue

Vol. 107, Iss. 3 — March 2023

Reuse & Permissions

Access Options

Author publication services for translation and copyediting assistance advertisement

Physical Review E

covering statistical, nonlinear, biological, and soft matter physics