Quantifying the stochasticity of policy parameters in reinforcement learning problems

Vahe Galstyan and David B. Saakian
Phys. Rev. E 107, 034112 – Published 8 March 2023

Abstract

The stochastic dynamics of reinforcement learning is studied using a master equation formalism. We consider two different problems—Q learning for a two-agent game and the multiarmed bandit problem with policy gradient as the learning method. The master equation is constructed by introducing a probability distribution over continuous policy parameters or over both continuous policy parameters and discrete state variables (a more advanced case). We use a version of the moment closure approximation to solve for the stochastic dynamics of the models. Our method gives accurate estimates for the mean and the (co)variance of policy variables. For the case of the two-agent game, we find that the variance terms are finite at steady state and derive a system of algebraic equations for computing them directly.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 7 August 2022
  • Revised 3 January 2023
  • Accepted 16 February 2023

DOI:https://doi.org/10.1103/PhysRevE.107.034112

©2023 American Physical Society

Physics Subject Headings (PhySH)

Statistical Physics & Thermodynamics

Authors & Affiliations

Vahe Galstyan1,2 and David B. Saakian2,*

  • 1AMOLF, Science Park 104, 1098 XG Amsterdam, Netherlands
  • 2A.I. Alikhanyan National Science Laboratory (Yerevan Physics Institute) Foundation, 2 Alikhanian Brothers Street, Yerevan 375036, Armenia

  • *saakian@yerphi.am

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 107, Iss. 3 — March 2023

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×