Subventions et des contributions :

Titre :
DeFacto: Acquiring, Curating, and Using a Bilingual Domain Aware Commonsense Knowledge Base
Numéro de l’entente :
RGPIN
Valeur d'entente :
115 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Québec, Autre, CA
Numéro de référence :
GC-2017-Q1-02419
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Renseignements supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Langlais, Philippe (Université de Montréal)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

Automatically extracting knowledge from a large set of mostly unstructured documents (such as the Web) and organizing it into a knowledge base (KB) is a key challenge in artificial intelligence. Intuitively, such KBs should directly impact the quality of many NLP applications such as question answering, information retrieval or Text Analytics. Open information extraction, the task of extracting knowledge from texts without much supervision (especially not a prescription of the kind of information to mine), has brought new hope for such an endeavour.

Despite a number of well-designed components are nowadays widespread and readily available for extracting facts and relations (so-called tuples) from texts, tapping information in large collections of texts still raises a number of issues. The technology embedded in a typical knowledge extraction pipeline is fraught with shortcomings: coreference resolution, named-entity resolution and parsing errors are collapsing so that many tuples (if not the vast majority) are simply useless. Also, most works are targeting very frequent entities and relations, which exclude a large quantity of information on domain specific texts that are pervasive over the Web.

Our long term objective consists in developing the necessary expertise in populating, curating, maintaining and using a KB. Our proposal departs from several existing initiatives by a number of key factors. First, since specific domains are prevalent over the Web, we want our technology to be domain aware. Second, since today's world is multi-lingual and because not everything is written in English, we further want our technology to be multi-lingual in nature. Last, most works are devoted to develop fully automatic technology for assisting humans. In our proposal, we are interested in measuring how much gaming with a purpose can make humans assist the computer.

In order to succeed, we target in this proposal the development of deFacto, a multi-domain, bilingual KB (French -- English) acquired iteratively from texts mined over the web, with the help of feedback collected from users via serious gaming.