Language-Based Differential Privacy with Accuracy Estimations and Sensitivity Analyses

Abstract: This thesis focuses on the development of programming frameworks to enforce, by construction, desirable properties of software systems. Particularly, we are interested in enforcing differential privacy -- a mathematical notion of data privacy -- while statically reasoning about the accuracy of computations, along with deriving the sensitivity of arbitrary functions to further strengthen the expressiveness of these systems. To this end, we first introduce DPella, a programming framework for differentially-private queries that allows reasoning about the privacy and accuracy of data analyses. DPella provides a novel component that statically tracks the accuracy of different queries. This component leverages taint analysis to infer statistical independence of the different noises that were added to ensure the privacy of the overall computation. As a result, DPella allows analysts to implement privacy-preserving queries and adjust the privacy parameters to meet accuracy targets or vice-versa. In the context of differentially-private systems, the sensitivity of a function determines the amount of noise needed to achieve a desired level of privacy. However, establishing the sensitivity of arbitrary functions is non-trivial. Consequently, systems such as DPella provided a limited set of functions -- whose sensitivity is known -- to apply over sensitive data, thus hindering the expressiveness of the language. To overcome this limitation, we propose a new approach to derive proofs of sensitivity in programming languages with support for polymorphism. Our approach enriches base types with information about the metric relation between values and applies parametricity to derive proof of a function's sensitivity. These ideas are formalized in a sound calculus and implemented as a Haskell library called Spar, enabling programmers to prove the sensitivity of their functions through type-checking alone. Overall, this thesis contributes to the development of expressive programming frameworks for data analysis with privacy and accuracy guarantees. The proposed approaches are feasible and effective, as demonstrated through the implementation of DPella and Spar.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)