Subventions et des contributions :

Titre :
Development of data-driven sampling and its application to protein design and variant prediction
Numéro de l’entente :
RGPIN
Valeur d'entente :
170 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Ontario, Autre, CA
Numéro de référence :
GC-2017-Q1-03251
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Kim, Philip (University of Toronto)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

In previous work, we have successfully developed machine-learning based methods to solve various prediction problems in structural biology, including predicting changes to stability and binding affinity upon mutations. We have also developed a new protein engineering platform tightly coupling computational design with high-throughput screening. However, in these approaches, we still made use of conventional, physics inspired modeling methods. Here, we plan to develop more data-driven methods for protein modeling.
One fundamental issue in computational biochemistry is conformational sampling. First, a protein in nature is a dynamic entity and will adopt many different conformations even in just its native state, not to mention once perturbations are introduced. Exhaustive exploration of different possible conformations using conventional methods is either computationally highly expensive or lacking in accuracy. In particular protein backbones present a challenge, while side-chains can generally be modeled well using so-called rotamers. We here propose to develop modern methods based on dimensionality reduction methods to make sampling much more efficient and accurate. The rationale is that a protein’s atoms are strongly constraint in their movement and much of these natural constraints can be learned from existing structures of different conformations; by now most important proteins have available structures in multiple conformations. We plan to develop a method based on Gaussian Process Latent Variable Models. Using this method, efficient subspaces can be learned on existing protein structures, and the dimensionality can potentially be reduced by orders of magnitude, speeding up sampling enormously while still generating accurate conformations. We will first establish this as a method to sample protein backbone conformations in a number of established protein scaffolds. Then, we will apply it to the problem of protein design and incorporate it into our previously developed protein engineering framework, where it will lead to improved design accuracy. Finally, we will implement it as a backbone relaxation method in our predictor of mutation effects on protein stability and binding affinities. Having the improved backbone sampling will also enable us to implement prediction of effects of indels and post-translational modification in this framework. Our novel sampling methodology will have a strong impact to many other problems in the field as well.