
Understanding the generalization properties of large, overparametrized, neural networks is a central problem in theoretical machine learning. Several insightful ideas have been proposed in this regard, among them: the implicit bias hypothesis, the possibility of having benign overfitting and the existence of feature learning regimes where neural networks learn the latent structure of data. However, a precise understanding of the emergence/validity of these behaviors cannot be disentangled from the study of the nonlinear training dynamics.
We use a technique from statistical physics, dynamical mean field theory, to study the training dynamics and obtain a rich picture of how generalization and overfitting arise in large overparametrized models. Our results are non-rigorous: we establish them by the standard of proof of statistical physics and validate them by numerical simulations. Indeed, probability theorists were successful in the past in making rigorous analogous results in simpler settings.
This is based on joint work with Andrea Montanari.