Subventions et des contributions :

Titre :
Collective Machine Learning for Semantic Data Interpretation
Numéro de l’entente :
RGPIN
Valeur d'entente :
210 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Ontario, Autre, CA
Numéro de référence :
GC-2017-Q1-03186
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Guo, Yuhong (Carleton University)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

This research will investigate collective machine learning algorithms that combine information from heterogeneous sources to automatically induce semantic interpreters of complex data. Two emerging trends motivate this research. First, in the era of big data there are an increasing number of freely available data sources that are relevant to any particular interpretation problem. Although such data sources vary in size and annotation coverage, their union can be leveraged to reduce the annotation cost required to achieve competence in a target interpretation task. Second, the growing success of machine learning has increased the ambition to move beyond learning simple classification models to adapting related semantic predictors across complex output categories.

The primary challenge of performing collective learning across complex output spaces lies in the heterogeneity of data sources, in terms of the different input features recorded, different output annotations captured, and different prediction tasks considered. To address this fundamental challenge, this research will develop novel representation learning algorithms that can uncover the shared structure underlying different data sets and different output targets.

If successful, this research program will overcome the boundaries of traditional machine learning and data analysis systems, and provide new tools for heterogeneous data analysis that address a significant need in the modern context of big data. Moreover, this research will also dramatically increase the autonomy and robustness of semantic data analysis systems while reducing, and in some cases eliminating, their dependence on human guidance.

The proposed research program has both fundamental and applied aspects and is expected to contribute progress in both respects. In particular, this research will not only contribute new mathematical and algorithmic developments in machine learning and data analysis research, it will also broaden the applicability of automated data analysis systems to a wider range of natural language processing, computer vision, bioinformatics, social and commercial data analysis problems. The resulting methods will be applicable to broad classes of heterogeneous data collected by governments, industry, organizations or individuals, and will significantly reduce the dependence on domain expertise in developing useful data interpretation systems.