Subventions et des contributions :
Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier (2017-2018 à 2020-2021).
Cloud storage systems serve as an important infrastructure for emerging applications including Big Data Analytics and Internet of Things (IoT). A key challenge is how to handle a massive amount of data in real-time in a cost-efficient way. Explosive growth in the volume and complexity of data exacerbates this challenge. Further, many cloud computing systems are networked and distributed, thus making storage system management more complex and costly due to the limited bandwidth. Data deduplication is an efficient data reduction approach that not only reduces storage space by eliminating duplicate data but also minimizes the transmission of redundant data even in low-bandwidth environments. However, conventional deduplication schemes suffer from high computation complexity in chunking and large storage overhead for storing block indices, thus failing to offer real-time and cost-efficient storage services. This research program aims to address the most important challenges facing the performance optimization of cloud storage systems targeting big data applications. We will conduct innovative research to overcome the current limitations of deduplication-based methodologies. To this end, we will investigate adaptive multi-granularity deduplication schemes to significantly reduce the amounts of data to be processed and improve the overall system performance. We plan to propose a new methodology in deduplication granularities to meet the needs of handling different data scales. Implementation techniques, including locality-aware hashing, deduplication, and compression synergization and pipeline scheduling, will be investigated, evaluated, and validated in real cloud storage platform.