Subventions et des contributions :

Retour à la page de recherche

Titre :

Mining interesting patterns from big data

Numéro de l’entente :

RGPIN

Valeur d'entente :

115 000,00 $

Date d'entente :

10 mai 2017 -

Organisation :

Conseil de recherches en sciences naturelles et en génie du Canada

Location :

Manitoba, Autre, CA

Numéro de référence :

GC-2017-Q1-03119

Type d'entente :

subvention

Type de rapport :

Subventions et des contributions

Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :

Leung, Carson Kai-Sang (Université du Manitoba)

Programme :

Programme de subventions à la découverte - individuelles

But du programme :

In the current era of big data, high volumes of a wide variety of valuable data of different veracity (e.g., imprecise or uncertain data like sensor data and medical lab test results, in which the contents are uncertain due to factors like inherited measurement inaccuracies or sampling frequency) can be easily generated or collected at a high velocity. Consequently, we are drowning in data but starving for knowledge. In order to be able to make sense of these data, data science solutions for big data management and big data mining (which discovers implicit, previously unknown & potentially useful knowledge that might be embedded in data) are in demand. Over the past few years, I--together with my HQP--have developed algorithms that use probabilistic approaches for finding frequent sets of items from uncertain data. The algorithms are enhanced with a few optimizations, including some in-memory tree structures for capturing important contents in the data. Along this direction of my current research program, I plan to broaden my research work with an objective to build a more efficient, user-friendly, and powerful data science system for mining interesting patterns from big data. Specifically, I plan to (1) adapt the developed algorithms to take into account the user preference and push these user constraints inside the mining process so that the resulting algorithms only output interesting patterns and no post-processing step is needed; (2) mine & integrate heterogeneous data sets from multiple related sources, and incorporate prior & posterior knowledge about these data; (3) further improve performance so as to provide users with real-time responses by exploring other optimizations and techniques (e.g., Apache Spark, Scala) and further reduce the memory requirements (e.g., by adapting Bigtable); (4) explore other quantitative & qualitative approaches to capture & analyze more data & information; (5) explore real-life applications (e.g., mining social networks, Web, telecommunication data, agricultural data, meteorological data, news feed, tweets, blogs) and find other interesting patterns such as trends, sequences & subgraphs (e.g., social networks/graphs); and (6) develop data visualization & visual analytics tools that enable users to visualize and analyze data, to change the user-specified mining parameters, and/or preferences based on the visualized information mined from the data. This would make the resulting system more interactive and exploratory. This, in turn, helps users to enrich their knowledge so that they could promptly take appropriate actions or to make the best (business or military) decision, which would consequently have significant positive impacts on the improvement of human life and the benefits to Canadian economy as well as national defense & security.