Subventions et des contributions :

Titre :
Learning from Social Media Texts
Numéro de l’entente :
RGPIN
Valeur d'entente :
23 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Ontario, Autre, CA
Numéro de référence :
GC-2017-Q1-01908
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2018-2019)

Nom légal du bénéficiaire :
Inkpen, Diana (Université d’Ottawa)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

Applications in the field of Natural Language Processing (NLP) have become popular in recent years. This is due to the availability of data and evaluation benchmarks, to progress in automated techniques, and finally to an increased need for these applications in our daily life and in commercial product development.

I propose a thorough investigation of NLP-based techniques for user modelling in social media. There are three specific objectives: (1) Learn user characteristics from social media texts. These characteristics may include: age, gender, personality type, location, ethnicity, health issues, political views, interests, and life events. (2) Learn population distributions in various social media (e.g., Twitter, forums, Facebook) for each of these characteristics. (3) Use the extracted information, as a proof of concept, in applications for e-business, market research, health monitoring, etc.

The scientific approach that I propose is based on machine learning, automatic text classification, deep learning, and information extraction techniques. We will focus our attention on social media texts, which are more challenging than regular texts due to non-standard spelling, lack of editing, abbreviations, jargon, and noise, The techniques need to be adapted to this kind of text. The various ways of adapting them include retraining, partial normalization of the messages, and adding features specific to each type of social media. In addition, I propose to integrate techniques that exploit the structure of the social network. Most of the previous work on related topics uses either only the texts of the messages or only the network structure. I believe that combining them may lead to an increase in the precision of the extracted information. No matter how detailed the study, we will also pay special attention to protecting the privacy of social media users.

The novelty of the proposed work consists in a comprehensive investigation of the existing techniques, in increasing their sophistication and in developing new techniques for the proposed tasks, as well as in the development of several proof-of-concept applications that require information about users or about populations of users in social media.

We anticipate that the outcomes of the project will contribute to a better understanding of human communication in social media, which will allow advances in mining the social media for market research purposes and other decision-making applications, and will strengthen Canada’s position as a major player in the field of information technology.