Subventions et des contributions :
Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)
Background (The needs)
1. Tremendous volumes of genomics/epigenomics data in public resources are generated with substantial effort of the research community, e.g. the Encyclopedia of DNA Elements (ENCODE) project, The Cancer Genome Atlas (TCGA) project, and the Gene Expression Omnibus (GEO) data repository. How to best utilize such valuable information, becomes critical to conduct biological research in the era of information explosion. Integrate such information in new research needs development of statistical methodologies and software.
2. Ongoing developments in next-generation sequencing (NGS) technologies enable biological experiments to be conducted on much larger scales. This has created new opportunities in biological and medical research, while posing new challenges in analyzing these data. Various NGS data of new and complex structures are continuously generated in different applications of NGS technologies. To extract useful information from such data, new statistical methodologies and software are urgently needed.
3. Precision Medicine is a young but rapidly advancing field of healthcare that involves tailoring medical treatments for individual patients based on the context of a patient’s biomarkers. Thus, discovery of biomarkers is the first and most important step for the success of precision medicine. High-throughput technologies (e.g. NGS) enable researchers to cost-efficiently generate data from large numbers of candidate biomarkers. This potentially improves discovery by widening the range of candidates, but requires paying more attention to multiple testing problems. When conducting a biomarker discovery study, similar studies might be conducted by other teams or already published in data repositories mentioned above. How to integrate such information also need development of statistical methodologies.
Proposed research
My proposed research focuses on developing methods and software in:
(1) integrated analysis of time-course multi-type NGS data;
(2) biomarker discovery from integrated analysis of multiple studies.
Topic (1) is to address the needs 1 and 2 described above, it will provide a software tool for integrated analysis of temporal multi-type NGS data, which can be either from a single study or from multiple studies available in public data repositories. It will also provide an approach to integrate temporal data generated with different time systems, and to make them comparable via appropriate standardization and adjustments.
Topic (2) is to address the needs 1 and 3 described above. It will allow researchers to conduct biomarker discovery from individual studies and to make an integrate decision based on all results. Our goal is to maximize power of biomarker discover, while keep the false positive rate under control.
My research will be implemented in R/Bioconductor software, and all researchers can use it in their research.