Subventions et des contributions :

Titre :
New Frontiers for Differentially Private Data-Analysis
Numéro de l’entente :
RGPIN
Valeur d'entente :
140 000,00 $
Date d'entente :
10 mai 2017 -
Organisation :
Conseil de recherches en sciences naturelles et en génie du Canada
Location :
Alberta, Autre, CA
Numéro de référence :
GC-2017-Q1-03439
Type d'entente :
subvention
Type de rapport :
Subventions et des contributions
Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :
Sheffet, Or (University of Alberta)
Programme :
Programme de subventions à la découverte - individuelles
But du programme :

In recent years, as privacy has become a matter of increasing urgency and public concern, differential privacy has been emerging as the gold-standard for privacy preserving data-analysis. This is because differential privacy introduces a powerful, mathematically rigorous notion of privacy with clear trade-offs between privacy-loss, different measurements of utility and the size of the data. Existing work has already successfully brought differential privacy to classical machine learning and ERM algorithms, and so it is high time to bring differential privacy to new fields of data-analysis. This programme is aimed at building a theoretical framework for addressing two main hurdles that, we believe, are currently preventing many data-centric fields from using differential privacy. We believe these two venues - in addition to presenting a formidable theoretical challenge – are the two main barriers that inhibit the uptake of differential privacy in fields such as medicine, economics and social studies.

(i) The type of analysis. The first goal of this project is to provide the theoretical framework for differentially private hypothesis testing and longitudinal studies. In this part of the project we will study the theory behind statistical tests and its adaption to differentially private computation. There are two major hurdles that we will work to overcome. First, statistics theory itself is more often asymptotic rather than based on proven sample size bounds (or calculations). Second, standard statistics theory often assumes that the sole source of randomness in the process comes from data-sampling, while the estimator is a deterministic function of the dataset. Differentially private estimators, which are inherently random, require us to re-analyze inference bounds to include both the randomness in the data sampling procedure and in the private algorithm. A special emphasis will be put on the statistical inference utilized in time-series analysis.

(ii) The volume of the data. The size of a dataset plays a major factor in differential privacy as it bounds the number and quality of the queries we may run on the data. Unfortunately, many existing datasets are only of moderate size (say, datasets kept by local clinics/hospitals) – holding, at best, a few hundreds of records. Luckily, there exists hundreds of such moderate-size datasets that contain similar data, which can be jointly aggregated to create one large-scale dataset – on which one can reap the benefits of big-data analysis (and among them – differential privacy). Ironically, the different data-gathering entities often refuse to share data due to privacy-related reasons. The second goal of this proposal is thus to allow for the multiple data-curators to share the computation rather than the data. We will design protocols that allow multiple data-curators to simulate differentially private computations over their joint datasets.