Subventions et des contributions :
Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)
In the same way we use the letters of the alphabet to write text, and bits 0 and 1 to write computer machine code, the four basic DNA units (A - adenine, C - cytosine, G - guanine, T - thymine) are used by Nature to write genetic information as DNA strands. The possibility of encoding symbolic information on DNA, and the fact that biochemical processes such as cut-and-paste of DNA strands have been proved to be able to perform arithmetic and logic operations, led to the development of the field of DNA computing and molecular programming. Bioinformation and biocomputation are different from their electronic counterparts in several aspects. First, biodata is not associated to a memory location but consists of infinitesimal DNA strands free-floating in solution. Second, in contrast to data in an electronic computer, which is passive, data-encoding DNA strands can interact with each other in programmable ways due to their Watson-Crick complementarity. Third, each data encoding DNA strand is usually present in millions of identical copies, and the bio-operations operate according to statistical laws. I aim to develop and investigate mathematical models of bioinformation and biocomputation that take into account such specific characteristics, as well as explore mathematical properties of naturally occurring DNA sequences, and their applications.
To this end, I approach the issue of data encoding on DNA by proposing to develop a formal-language-based "theory of bioinformation and biocomputation" . This includes defining and investigating new concepts that capture the biological reality of DNA- and RNA-encoded information, as well as investigating properties of bio-operations, and their relationships to traditional models of information and computation. Besides its potential significance for the design of programmable DNA-based computational devices, the impact of this research is that it creates a mutually enriching link between theoretical computer science and molecular biology. Secondly, I propose an investigation into DNA self-assembly as a computational tool, the results of which could have potential implications for experimental DNA nanocomputations, and for the molecular programming of complex nanostructures. Lastly, I propose to gain insights into the mathematical properties of naturally-occurring bioinformation by investigating the connection between the syntactical structure of genomic sequences and species classification. This includes investigating Chaos Game Representations of DNA sequences as genomic signatures, as well as applications of this method to HIV-1 virus subtyping and to the classification of marine microbial eukaryotes based on their RNA transcriptomes. The potential impact of such an alignment-free universal classification method could be significant, given that 86% of existing species on Earth and 91% of species in the oceans still await classification.