• Editors' Suggestion

Deterministic limit of temporal difference reinforcement learning for stochastic games

Wolfram Barfuss, Jonathan F. Donges, and Jürgen Kurths
Phys. Rev. E 99, 043305 – Published 10 April 2019

Abstract

Reinforcement learning in multiagent systems has been studied in the fields of economic game theory, artificial intelligence, and statistical physics by developing an analytical understanding of the learning dynamics (often in relation to the replicator dynamics of evolutionary game theory). However, the majority of these analytical studies focuses on repeated normal form games, which only have a single environmental state. Environmental dynamics, i.e., changes in the state of an environment affecting the agents' payoffs has received less attention, lacking a universal method to obtain deterministic equations from established multistate reinforcement learning algorithms. In this work we present a methodological extension, separating the interaction from the adaptation timescale, to derive the deterministic limit of a general class of reinforcement learning algorithms, called temporal difference learning. This form of learning is equipped to function in more realistic multistate environments by using the estimated value of future environmental states to adapt the agent's behavior. We demonstrate the potential of our method with the three well-established learning algorithms Q learning, SARSA learning, and actor-critic learning. Illustrations of their dynamics on two multiagent, multistate environments reveal a wide range of different dynamical regimes, such as convergence to fixed points, limit cycles, and even deterministic chaos.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
3 More
  • Received 20 September 2018

DOI:https://doi.org/10.1103/PhysRevE.99.043305

©2019 American Physical Society

Physics Subject Headings (PhySH)

Interdisciplinary PhysicsNonlinear DynamicsStatistical Physics & Thermodynamics

Authors & Affiliations

Wolfram Barfuss1,2,*, Jonathan F. Donges1,3, and Jürgen Kurths1,2,4

  • 1Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
  • 2Department of Physics, Humboldt University Berlin, 12489 Berlin, Germany
  • 3Stockholm Resilience Centre, Stockholm University, 104 05 Stockholm, Sweden
  • 4Saratov State University, 410012 Saratov, Russia

  • *barfuss@pik-potsdam.de

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 99, Iss. 4 — April 2019

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×