Subventions et des contributions :
Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)
Big data are changing the way people and businesses make decisions. However, before investing time and other resources to analyze the vast amounts of available data, it is critical to ask questions such as "Do we have the right data for the task at hand?", "Do we need to clean the data before they are suitable for analysis?", or "Is there structure in the data that can help us do effective and efficient analytics?". These pertinent questions can be answered through data profiling: the activity of collecting metadata, i.e., data about data. Given a dataset, say, in the form of a spreadsheet, useful metadata may include quantitative information such as the number of rows, the number of distinct values and the identities of frequently occurring values, and structural information such as correlations or dependencies among columns.
One could profile a small dataset just by looking at it, but automated techniques are clearly needed for big data: while the amount of data keeps growing, human cognitive processing capacity is fixed. The proposed research program will develop new methods, algorithms and software tools for data profiling, focusing on the technical challenges arising from the three “V”s of big data: Volume (the growing amount of data generated by social media, the internet-of-things, etc.), Velocity (the high speed with which data are generated, e.g., sensor readings or twitter messages) and Variety (business data, numeric data, graph data such as friend/follower relationships in social media, etc.). This research will play a major role in Dr. Golab’s long-term research agenda to help individuals and businesses get more value out of big data.
Data profiling tools are urgently needed to make data analytics more accessible to the increasing number of experts and non-experts interested in incorporating big data into their decision-making processes. Such tools will help Canadian governments, utilities, automotive companies, healthcare companies and banks to use big data more effectively and efficiently. The anticipated deliverables will also be of interest Canada’s world-renowned database companies such as IBM Toronto and SAP Waterloo: a key to improving the performance of data analytics is to exploit structural relationships in the data. Furthermore, the proposed research will be led by graduate students who will acquire sought-after skills in data science and big data engineering, which will help them to take leadership roles in Canada’s increasingly data-driven economy.