Gradient flows and proximal splitting methods: A unified view on accelerated and stochastic optimization

Guilherme França, Daniel P. Robinson, and René Vidal
Phys. Rev. E 103, 053304 – Published 10 May 2021

Abstract

Optimization is at the heart of machine learning, statistics, and many applied scientific disciplines. It also has a long history in physics, ranging from the minimal action principle to finding ground states of disordered systems such as spin glasses. Proximal algorithms form a class of methods that are broadly applicable and are particularly well-suited to nonsmooth, constrained, large-scale, and distributed optimization problems. There are essentially five proximal algorithms currently known, each proposed in seminal work: Forward-backward splitting, Tseng splitting, Douglas-Rachford, alternating direction method of multipliers, and the more recent Davis-Yin. These methods sit on a higher level of abstraction compared to gradient-based ones, with deep roots in nonlinear functional analysis. In this paper we show that all of these methods are actually different discretizations of a single differential equation, namely, the simple gradient flow which dates back to Cauchy (1847). An important aspect behind many of the success stories in machine learning relies on “accelerating” the convergence of first-order methods. However, accelerated methods are notoriously difficult to analyze, counterintuitive, and without an underlying guiding principle. We show that similar discretization schemes applied to Newton's equation with an additional dissipative force, which we refer to as accelerated gradient flow, allow us to obtain accelerated variants of all these proximal algorithms—the majority of which are new although some recover known cases in the literature. Furthermore, we extend these methods to stochastic settings, allowing us to make connections with Langevin and Fokker-Planck equations. Similar ideas apply to gradient descent, heavy ball, and Nesterov's method which are simpler. Our results therefore provide a unified framework from which several important optimization methods are nothing but simulations of classical dissipative systems.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
9 More
  • Received 15 January 2021
  • Revised 21 April 2021
  • Accepted 22 April 2021

DOI:https://doi.org/10.1103/PhysRevE.103.053304

©2021 American Physical Society

Physics Subject Headings (PhySH)

Statistical Physics & Thermodynamics

Authors & Affiliations

Guilherme França1,2,*, Daniel P. Robinson3, and René Vidal2

  • 1University of California, Berkeley, California 94720, USA
  • 2Johns Hopkins University, Baltimore, Maryland 21218, USA
  • 3Lehigh University, Bethlehem, Pennsylvania 18015, USA

  • *guifranca@gmail.com

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 103, Iss. 5 — May 2021

Reuse & Permissions
Access Options
CHORUS

Article Available via CHORUS

Download Accepted Manuscript
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×