Subventions et des contributions :

Titre :
Automated monitoring and debugging of large scale manycore heterogeneous systems
Numéro de l’entente :
CRDPJ
Valeur d'entente :
761 820,00 $
Date d'entente :
13 déc. 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Québec, Autre, CA
Numéro de référence :
GC-2017-Q3-00356
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier (2017-2018 à 2021-2022).

Nom légal du bénéficiaire :
Dagenais, Michel (École Polytechnique de Montréal)
Programme :
Subventions de recherche et développement coopérative - projet
But du programme :

The communication and computing infrastructure has evolved through the years, getting more efficient, sophisticated, integrated and networked. Newer mobile devices (including smart robots or autonomous cars) and servers often contain 8 or more cores in their central processing unit. These systems are based on heterogeneous processors, with efficient traditional central processing units, but also with co-processing units optimised for graphics (GPGPUs with thousands of cores), networking, signal processing or even for Machine Learning. These co-processing units are highly parallel and often contain over 8 billion logic elements (transistors) each. Adding to this complexity is the increasing reliance on virtualisation, which hides the specificities of the hardware, allowing an application to run on several different processor models, but makes the performance more difficult to analyse. x000D
x000D
As a result, even a simple operation such as initiating a phone call, making a Web search, routing a packet or displaying a video frame, can involve many parallel cores on more than one processing unit, possibly on several servers. Moreover, the same operation, a few seconds later, may be served in a different way by different cores and physical servers. Therefore, understanding the performance of these operations has become extremely difficult and the tools for that purpose are severely lacking. In this project, the tracing, monitoring, profiling and debugging tools for manycore systems will be extended to efficiently extract information from all units in all layers, from the hardware to the application, and cope with the large number (several thousands) of cores. Furthermore, new methods and algorithms will be developed to automate the analysis of the extracted monitoring data. As a result, the designers and operators of distributed applications on mobile devices, cloud servers and other heterogeneous computing systems, will have the tools in hand to quickly analyse their system performance, automatically or manually find problems, and optimise operations.x000D
x000D
x000D