Subventions et des contributions :

Titre :
Big Data platforms for science automation
Numéro de l’entente :
RGPIN
Valeur d'entente :
100 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Québec, Autre, CA
Numéro de référence :
GC-2017-Q1-03396
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Glatard, Tristan (Université Concordia)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

The objective of my research program is to automate Big Data analyses from data processing to knowledge publication. In the next years, I will focus on the following three objectives of this immense challenge. (1) Interoperability among Big Data platforms: Big Data platforms used in science currently operate in silos, which hinders open science, reduces the overall quality of platforms by limiting competition, and creates technological dependence on particular software projects. I will design the building blocks of a decentralized network connecting platforms so that data, processing pipelines and analyses can be uniformly found, accessed and reused in various platforms. (2) Reproducibility of Big Data analyses over time and space: science is going through a severe reproducibility crisis, which, if not properly addressed, will prevent various disciplines from leveraging the tremendous wealth of data that is acquired: Big Data has to be converted into trusted knowledge. I will develop methods to identify, quantify and correct reproducibility issues, focusing on the challenges that originate in the computing infrastructure. (3) Performance optimization of Big Data computations: while they have been transformational for industry, Big Data technologies remain underused in science, due to the lack of appropriate benchmarks and performance optimization studies. I will design and create an optimized, easy-to-use Big Data processing environment for large scientific datasets. These three objectives echo the notorious V’s of Big Data: interoperability addresses data Variety, reproducibility targets Veracity, and performance provides Velocity and manages data Volume.

Overall, this program will speed up and improve the quality of knowledge production. It will foster cyberinfrastructure automation, by allowing scientific objects to seamlessly move across platforms, by ensuring the consistency of results computed in different platforms and by accelerating Big Data analyses. While potential applications of this research span the whole spectrum of scientific disciplines engaged in data science, I will focus on neuroinformatics, exploiting my long-standing top-level collaborations in this field and leveraging the tremendous experience acquired with the development and operation of the VIP and CBRAIN platforms. Technology transfer to the Canadian industry is also expected, in the IT (Big Data and cloud) and pharmaceutical sectors.