Subventions et des contributions :

Titre :
Learning good representations for and with reinforcement learning
Numéro de l’entente :
RGPIN
Valeur d'entente :
355 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Québec, Autre, CA
Numéro de référence :
GC-2017-Q1-03493
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Precup, Doina (Université McGill)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

Artificial intelligence (AI) has made great progress in isolating different aspects of intelligence and proposing flexible representations and powerful algorithms that lead to competence in specific tasks. For example, AI agents are better than humans at playing games like Go, a feat once considered impossible. However, the sort of flexible, robust, and autonomous competence routinely exhibited by humans, or even animals, remains elusive. The best AI systems are still tuned to specific problems. Our main research goal is to develop general AI methodology that relies, at its core, on reinforcement learning. Reinforcement learning is an approach to learning from interaction with an environment, inspired by animal learning theory. This proposal aims to design algorithms that can automatically create representations for reinforcement learning agents which allow them to model the world and to act at multiple time scales. We aim to provide new optimization criteria which describe formally what is a good set of abstract representations, provide gradient-based learning algorithms to learn such models, and demonstrate their effectiveness through empirical evaluations in simulated domains, game playing, as well as real time series prediction data sets. We will tackle the crucial problem of exploration, by explaining how an agent should move about its environment in order to optimize its learning speed. Finally, we will leverage these methods inside other algorithms that can benefit from multiple time scales, such as the training of deep, recurrent neural networks.