• Open Access

Modeling the Influence of Data Structure on Learning in Neural Networks: The Hidden Manifold Model

Sebastian Goldt, Marc Mézard, Florent Krzakala, and Lenka Zdeborová
Phys. Rev. X 10, 041044 – Published 3 December 2020

Abstract

Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterized by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model training data or assumes that elements of each data sample are drawn independently from some factorized probability distribution. These approaches are, thus, by construction blind to the correlation structure of real-world datasets and their impact on learning in neural networks. Here, we introduce a generative model for structured datasets that we call the hidden manifold model. The idea is to construct high-dimensional inputs that lie on a lower-dimensional manifold, with labels that depend only on their position within this manifold, akin to a single-layer decoder or generator in a generative adversarial network. We demonstrate that learning of the hidden manifold model is amenable to an analytical treatment by proving a “Gaussian equivalence property” (GEP), and we use the GEP to show how the dynamics of two-layer neural networks trained using one-pass stochastic gradient descent is captured by a set of integro-differential equations that track the performance of the network at all times. This approach permits us to analyze in detail how a neural network learns functions of increasing complexity during training, how its performance depends on its size, and how it is impacted by parameters such as the learning rate or the dimension of the hidden manifold.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
4 More
  • Received 3 May 2020
  • Revised 1 September 2020
  • Accepted 8 September 2020

DOI:https://doi.org/10.1103/PhysRevX.10.041044

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Statistical Physics & ThermodynamicsInterdisciplinary Physics

Authors & Affiliations

Sebastian Goldt1,*, Marc Mézard1,†, Florent Krzakala1,‡, and Lenka Zdeborová2,§

  • 1Laboratoire de Physique de l’Ecole Normale Supérieure, Université PSL, CNRS, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, 75005 Paris, France
  • 2Institut de Physique Théorique, CNRS, CEA, Université Paris-Saclay, 91191 Gif-sur-Yvette, France

  • *sebastian.goldt@phys.ens.fr
  • marc.mezard@ens.fr
  • florent.krzakala@ens.fr
  • §lenka.zdeborova@cea.fr

Popular Summary

Deep neural networks—machine-learning algorithms that mimic the human brain—have reached impressive levels of performance on tasks ranging from image classification to natural language processing. Yet from a theoretical point of view, the reasons for their success remain unclear. Closing this gap between theory and practice might help address open problems in the practice of machine learning. A key challenge is understanding how the structure of real-world data impacts learning in neural networks. Here, we address this question by introducing a model for structured data sets that we call the “hidden manifold model.” This model provides a more realistic framework in which the dynamics and performance of machine learning can be analyzed.

The hidden manifold model is a type of statistical model for generating high-dimensional data from low-dimensional inputs. We show experimentally that training two-layer neural networks on such data reproduces some effects seen when training neural networks on practical data sets for image classification. We also derive a “Gaussian equivalence principle,” which allows us to study the dynamics and the performance of two-layer neural networks analytically and in detail.

Our work provides a model and new analytical techniques to analyze learning when inputs have nontrivial correlations, thus paving the way for a systematic study of how learning is shaped by data structure.

Key Image

Article Text

Click to Expand

References

Click to Expand
Issue

Vol. 10, Iss. 4 — October - December 2020

Subject Areas
Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review X

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 4.0 International license. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×