• Open Access

Spatially heterogeneous learning by a deep student machine

Hajime Yoshino
Phys. Rev. Research 5, 033068 – Published 31 July 2023

Abstract

Despite spectacular successes, deep neural networks (DNNs) with a huge number of adjustable parameters remain largely black boxes. To shed light on the hidden layers of DNNs, we study supervised learning by a DNN of width N and depth L consisting of NL perceptrons with c inputs by a statistical mechanics approach called the teacher-student setting. We consider an ensemble of student machines that exactly reproduce M sets of N-dimensional input/output relations provided by a teacher machine. We show that the statistical mechanics problem becomes exactly solvable in a high-dimensional limit which we call a “dense limit”: Nc1 and M1 with fixed α=M/c using the replica method developed by Yoshino [SciPost Phys. Core 2, 005 (2020)] In conjunction with the theoretical study, we also study the model numerically performing simple greedy Monte Carlo simulations. Simulations reveal that learning by the DNN is quite heterogeneous in the network space: configurations of the teacher and the student machines are more correlated within the layers closer to the input/output boundaries, while the central region remains much less correlated due to the overparametrization in qualitative agreement with the theoretical prediction. We evaluate the generalization error of the DNN with various depths L both theoretically and numerically. Remarkably, both the theory and the simulation suggest that the generalization ability of the student machines, which are only weakly correlated with the teacher in the center, does not vanish even in the deep limit L1, where the system becomes heavily overparametrized. We also consider the impact of the effective dimension D(N) of data by incorporating the hidden manifold model [Goldt, Mézard, Krzakala, and Zdevorová, Phys. Rev. X 10, 041044 (2020)] into our model. Replica theory implies that the loop corrections to the dense limit, which reflect correlations between different nodes in the network, become enhanced by either decreasing the width N or decreasing the effective dimension D of the data. Simulation suggests that both lead to significant improvements in generalization ability.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
13 More
  • Received 14 February 2023
  • Revised 29 May 2023
  • Accepted 12 July 2023

DOI:https://doi.org/10.1103/PhysRevResearch.5.033068

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Statistical Physics & Thermodynamics

Authors & Affiliations

Hajime Yoshino

  • Cybermedia Center, Osaka University, Toyonaka, Osaka 560-0043, Japan and Graduate School of Science, Osaka University, Toyonaka, Osaka 560-0043, Japan

Article Text

Click to Expand

References

Click to Expand
Issue

Vol. 5, Iss. 3 — July - September 2023

Subject Areas
Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review Research

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 4.0 International license. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×