Subventions et des contributions :
Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)
Deep neural networks have recently pushed forward the state-of-the-art in applications as diverse as image understanding, language understanding, genomics, computational chemistry, game playing, and robotics. Deep generative models in particular have seen rapid progress in the past few years, with networks able to produce plausible images or speech signals. As diverse as these application areas are, the challenges and frustrations facing the researchers and practitioners have much in common. Neural networks can take weeks to train on expensive Graphics Processing Unit (GPU) hardware. The algorithms have many more knobs which need to be tweaked, compared with the previous generation of machine learning algorithms. In the case of deep generative models, it can be hard to determine if the networks are learning to model the data or simply memorizing their training examples.
Solutions to any of these issues would have enormous impact across a range of application areas, both in industry and in scientific research. I propose to tame the complexity of neural networks using the techniques of structured probabilistic modeling. Over the next five years, I will focus on two main threads: improving the optimization of neural networks, and evaluating deep generative models.
Training neural networks can take weeks, even with modern GPU hardware. Previously, I introduced a method for deriving efficient second-order optimization algorithms from probabilistic models of the curvature of a cost function. This led to large speedups in training many types of neural nets. I plan to extend this technique to state-of-the-art architectures for image, video, and text understanding, and to scale up the technique to training on a cluster rather than an individual processor. The end result will be a general-purpose neural net training algorithm which is efficient and scalable and requires little hand-tweaking.
The main obstacle to evaluating generative models is the intractability of computing the probability assigned to a configuration, coupled with the difficulty of determining how accurate one's estimates are. (Papers published on the topic tend to include caveats that the reported probabilities may be extremely inaccurate.) The difficulty of evaluation is considered one of the main factors holding back scientific progress on generative modeling. I aim to develop techniques for obtaining confidence intervals for the probabilities which are both tight and accurate, so that we can have confidence in our evaluation of generative models. This will enable rigorous empirical study of generative models, which in turn will allow us to improve them.