Subventions et des contributions :

Titre :
The distribution of data and computation
Numéro de l’entente :
RGPIN
Valeur d'entente :
210 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Québec, Autre, CA
Numéro de référence :
GC-2017-Q1-02840
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Kemme, Bettina (Université McGill)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

In the current era of Big Data, an enormous amount of data is collected, stored, transformed and mined. Platforms for the management, analysis and dissemination of data are thus a fundamental infrastructure. They need to scale, handling every increasing amounts of data as well as more and more complex analytical tasks. Typical such platforms are multi-layered and/or component-based. Such an approach provides scoping and reusability. However, it is prone to performance penalties. Upper layers might configure lower layers in a non-optimal way and lower layers are not optimized for a specific application. Furthermore, functionality might be duplicated if a lower layer does not expose all what it is able to do in an appropriate way.

The long-term objective of our research is to develop scalable and efficient platforms for the management and processing of data. To do so, we take a holistic cross-layer and cross-component approach, and look at several software layers of a platform at the same time, understand their needs, their capabilities and their interactions, and then explore the potential of functionality traversal , to see at which layer(s) functionality is best provided and how it can be best exposed. Such cross-layer approach is challenging as it requires to gain expertise in several domains and understand their interactions, but it promises to enable much needed scalability and performance.

In the short- and medium-term we focus on three particular problems. First, we will develop a publish/subscribe platform that combines data dissemination with maintenance of application data, that is, it merges the pub/sub communication paradigm with database aspects. More precisely, we store domain information in form of graphs within our platform using a graph database system. Subscribers express interest to a sub-graph of this graph and publications refer to a part of the graph. We have identified many applications that can benefit from this integrated approach.

Second, we want to push machine learning tasks down into the core of the database engine. While distributed compute frameworks exist that support both relational data processing and machine learning, such as Apache Spark, our goal is to intertwine relational operators with machine learning and distribution to provide a holistic framework for optimization.

Third, we envision the cloud to provide Monitoring as a Service (MaaS) installed within its software-defined network. Applications can tell the MaaS which data flows they want to observe and what they want to monitor, potentially providing their own functions that process application dependent message content while the MaaS provides application independent functions.

In all cases, the interplay of several components plays an integral role, and with this research will get better insights of how services can better provide much needed functionality.