Subventions et des contributions :

Titre :
Overcoming Data Sparsity in Machine Translation
Numéro de l’entente :
RGPIN
Valeur d'entente :
115 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Alberta, Autre, CA
Numéro de référence :
GC-2017-Q1-02922
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Kondrak, Grzegorz (University of Alberta)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

Canada is a multicultural society. A large percentage of Canadian residents report a mother tongue that is distinct from either English or French. In addition, Canada is home to a rich variety of indigenous languages, some of which have also been granted official status. Everyone has the right to get all official federal government services, publications and documents in both English and French. Important information for new Canadians is often provided in multiple languages and scripts. Increasing the availability of texts in aboriginal languages increases their prestige, and thus helps preserve them.

As a consequence, there exists an acute need for accurate and rapid translations, not only between English and French, but also into other languages. Human translation is slow and expensive, and requires highly-skilled experts. Computer translation programs, known as machine translation, have the potential to fill the gap. Unfortunately, the current technology is far from perfect. The quality of translations involving smaller languages is often poor, and even between major languages, it is sometimes inadequate for technical applications.

Two of the reasons for the low quality of machine translation are the scarcity of bilingual texts for low-resourced languages, and the prevalence of infrequent words, such as certain verb inflections in French. The dominant statistical machine translation approach, which is used in web programs such as Google Translate, struggles to properly translate words that occur only rarely in bilingual texts.

The objective of this proposal is to improve the quality of machine translation by improving the handling of infrequent words. The principal research directions are the incorporation of the state-of-the-art morphological techniques into the translation process, the development of lexicon induction methods, and the translation of out-of-vocabulary words based on the cutting-edge algorithms for cognate identification, name transliteration, and decipherment.

In the current global economy, the enormous demand for fast and freely-available translations can only be satisfied by the machine translation programs. The solutions that I outline in my proposal will not only improve the quality of machine translation, but also influence the research on other aspects of natural language processing, thus accelerating the progress towards the goal of making computers understand human language.