Abstract
We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian probability distributions and the learning dynamics is of a spin-glass nature, with the composition of the training set playing the role of quenched disorder. We show how dynamical replica theory can be used to predict the evolution of macroscopic observables, including the two relevant performance measures (training error and generalization error), incorporating the old formalism developed for complete training sets in the limit as a special case. For simplicity, we restrict ourselves in this paper to single-layer networks and realizable tasks. In the case of (on-line and batch) Hebbian learning, where a direct exact solution is possible, we show that our theory provides exact results at any time in many different verifiable cases. For non-Hebbian learning rules, such as PERCEPTRON and ADATRON, we find very good agreement between the predictions of our theory and numerical simulations. Finally, we derive three approximation schemes aimed at eliminating the need to solve a functional saddle-point equation at each time step, and we assess their performance. The simplest of these schemes leads to a fully explicit and relatively simple nonlinear diffusion equation for the joint field distribution, which already describes the learning dynamics surprisingly well over a wide range of parameters.
- Received 4 October 1999
DOI:https://doi.org/10.1103/PhysRevE.62.5444
©2000 American Physical Society