Numerical Methods for Deep Learning
Times
This minicourse is presented as part of the program on Mathematical and Computational Aspects of Machine Learning held at the Scuola Normale Superiore in Pisa, Italy. I like to thank the organizers for the invitation and generous support.
Description
This minicourse gives an introduction into numerical methods for training deep neural networks. We will look under the hood of currently used deep learning methods and outline relations to traditional methods from numerical analysis such as numerical linear algebra, optimization, and partial differential equations. The course consists of three parts that will be split between six one-hour lectures (approximately two lectures per part). Overview of the three parts:
- We introduce the basic notation and some examples of learning problems and then review linear models in detail. We consider linear regression and classification problems and review numerical optimization methods used for training those models. We emphasize the importance of generalization and show how to achieve it using regularization theory.
- We extend our discussion to nonlinear models, in particular, multi-layer perceptrons and residual neural networks. We demonstrate that even the training of a single-layer neural network leads to a challenging non-convex optimization problem and overview some heuristics such as Variable Projection and stochastic approximation schemes that can effectively train nonlinear models. Finally, we demonstrate challenges associated with deep networks such as their stability and computational costs of training.
- We show that residual neural networks can be interpreted as discretizations of a nonlinear time-dependent ordinary differential equation that depends on unknown parameters, i.e., the network weights. We show how this insight has been used, e.g., to study the stability of neural networks, design new architectures, or use established methods from optimal control methods for training ResNets. Finally, we discuss open questions and opportunities for mathematical advances in this area.
Prerequisites
In order to succeed in this class, students need to have a solid background in multivariate calculus and linear algebra and some programming experience in MATLAB, Julia, or Python. In addition, students are also expected to have experience or skills in either numerical analysis (optimization, partial differential equations) or machine learning (e.g., CS534, CS584, or similar). To get a stronger background on optimization, students are encouraged to enroll in MATH 347.