This talk presents new connections between optimal transport (OT), which has been a critical problem in applied mathematics for centuries, and machine learning (ML), which has been receiving enormous attention in the past decades. In recent years, OT and ML have become increasingly intertwined. This talk contributes to this booming intersection by providing efficient and scalable computational methods for OT and ML. The first part of the talk shows how neural networks can be used to efficiently approximate the optimal transport map between two densities in high dimensions. To avoid the curse-of-dimensionality, we combine Lagrangian and Eulerian viewpoints and employ neural networks to solve the underlying Hamilton-Jacobi-Bellman equation. Our approach avoids any space discretization and can be implemented in existing machine learning frameworks. We present numerical results for OT in up to 100 dimensions and validate our solver in a two-dimensional setting. The second part of the talk shows how optimal transport theory can improve the efficiency of training generative models and density estimators, which are critical in machine learning. We consider continuous normalizing flows (CNF) that have emerged as one of the most promising approaches for variational inference in the ML community. Our numerical implementation is a discretize-optimize method whose forward problem relies on manually derived gradients and Laplacian of the neural network and uses automatic differentiation in the optimization. In common benchmark challenges, our method outperforms state-of-the-art CNF approaches by reducing the network size by 8x, accelerate the training by 10x- 40x and allow 30x-50x faster inference.