In this talk, I illustrate the use of numerical analysis tools for improving the effectiveness of deep learning algorithms. With a focus on deep neural networks that can be modeled as differential equations, I highlight the importance of choosing an adequate time integrator. I also compare, using a numerical example, the difference of the first-discretize-then-differentiate and the first-differentiate-then-discretize paradigms for training residual neural networks. Finally, I show that even simple (i.e., not deep) architectures can give rise to ill-conditioned learning problems.