Statistical mechanics of learning from examples

H. S. Seung, H. Sompolinsky, and N. Tishby
Phys. Rev. A 45, 6056 – Published 1 April 1992
PDFExport Citation

Abstract

Learning from examples in feedforward neural networks is studied within a statistical-mechanical framework. Training is assumed to be stochastic, leading to a Gibbs distribution of networks characterized by a temperature parameter T. Learning of realizable rules as well as of unrealizable rules is considered. In the latter case, the target rule cannot be perfectly realized by a network of the given architecture. Two useful approximate theories of learning from examples are studied: the high-temperature limit and the annealed approximation. Exact treatment of the quenched disorder generated by the random sampling of the examples leads to the use of the replica theory. Of primary interest is the generalization curve, namely, the average generalization error εg versus the number of examples P used for training. The theory implies that, for a reduction in εg that remains finite in the large-N limit, P should generally scale as αN, where N is the number of independently adjustable weights in the network. We show that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law. In contrast, for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule. In particular, a discontinuous learning transition from a state of poor to a state of perfect generalization can occur in nonsmooth networks learning realizable rules.

We illustrate both gradual and continuous learning with a detailed analytical and numerical study of several single-layer perceptron models. Comparing with the exact replica theory of perceptron learning, we find that for realizable rules the high-temperature and annealed theories provide very good approximations to the generalization performance. Assuming this to hold for multilayer networks as well, we propose a classification of possible asymptotic forms of learning curves in general realizable models. For unrealizable rules we find that the above approximations fail in general to predict correctly the shapes of the generalization curves. Another indication of the important role of quenched disorder for unrealizable rules is that the generalization error is not necessarily a monotonically increasing function of temperature. Also, unrealizable rules can possess genuine spin-glass phases indicative of degenerate minima separated by high barriers.

  • Received 23 September 1991

DOI:https://doi.org/10.1103/PhysRevA.45.6056

©1992 American Physical Society

Authors & Affiliations

H. S. Seung and H. Sompolinsky

  • Racah Institute of Physics and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel
  • AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, New Jersey 07974

N. Tishby

  • AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, New Jersey 07974

References (Subscription Required)

Click to Expand
Issue

Vol. 45, Iss. 8 — April 1992

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review A

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×