Event Series
Event Type
Seminar
Friday, April 12, 2024 11:00 AM
Molei Tao (Georgia Tech)

This talk will discuss some nontrivial but often pleasant effects of large learning rates, which are commonly used in machine learning practice for improved empirical performances, but defy traditional theoretical analyses. I will first quantify how large learning rates can help gradient descent in multiscale landscape escape local minima, via chaotic dynamics, which provides an alternative to the commonly known escape mechanism due to noises from stochastic gradients. I will then report how large learning rates provably bias toward flatter minimizers. Several related, perplexing phenomena have been empirically observed recently, including Edge of Stability, loss catapulting, and balancing. I will, for the first time, unify them and explain that they are all algorithmic implicit biases of large learning rates. These results are enabled by a new global convergence result of gradient descent for certain nonconvex functions without Lipschitz gradient. This theory will also finally provide some understanding of when there will be Edge of Stability and other large learning rate implicit biases.