Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Differentially Private Simple Linear Regression

Differentially Private Simple Linear Regression AbstractEconomics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study regression algorithms that satisfy differential privacy, a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with side information about the dataset. Motivated by the Opportunity Atlas, a high-profile, small-area analysis tool in economics research, we perform a thorough experimental evaluation of differentially private algorithms for simple linear regression on small datasets with tens to hundreds of records—a particularly challenging regime for differential privacy. In contrast, prior work on differentially private linear regression focused on multivariate linear regression on large datasets or asymptotic analysis. Through a range of experiments, we identify key factors that affect the relative performance of the algorithms. We find that algorithms based on robust estimators—in particular, the median-based estimator of Theil and Sen—perform best on small datasets (e.g., hundreds of datapoints), while algorithms based on Ordinary Least Squares or Gradient Descent perform better for large datasets. However, we also discuss regimes in which this general finding does not hold. Notably, the differentially private analogues of Theil–Sen (one of which was suggested in a theoretical work of Dwork and Lei) have not been studied in any prior experimental work on differentially private linear regression. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Proceedings on Privacy Enhancing Technologies de Gruyter

Loading next page...
 
/lp/de-gruyter/differentially-private-simple-linear-regression-VXc4nB2jfB
Publisher
de Gruyter
Copyright
© 2022 Daniel Alabi et al., published by Sciendo
ISSN
2299-0984
eISSN
2299-0984
DOI
10.2478/popets-2022-0041
Publisher site
See Article on Publisher Site

Abstract

AbstractEconomics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study regression algorithms that satisfy differential privacy, a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with side information about the dataset. Motivated by the Opportunity Atlas, a high-profile, small-area analysis tool in economics research, we perform a thorough experimental evaluation of differentially private algorithms for simple linear regression on small datasets with tens to hundreds of records—a particularly challenging regime for differential privacy. In contrast, prior work on differentially private linear regression focused on multivariate linear regression on large datasets or asymptotic analysis. Through a range of experiments, we identify key factors that affect the relative performance of the algorithms. We find that algorithms based on robust estimators—in particular, the median-based estimator of Theil and Sen—perform best on small datasets (e.g., hundreds of datapoints), while algorithms based on Ordinary Least Squares or Gradient Descent perform better for large datasets. However, we also discuss regimes in which this general finding does not hold. Notably, the differentially private analogues of Theil–Sen (one of which was suggested in a theoretical work of Dwork and Lei) have not been studied in any prior experimental work on differentially private linear regression.

Journal

Proceedings on Privacy Enhancing Technologiesde Gruyter

Published: Apr 1, 2022

Keywords: differential privacy; linear regression; robust statistics

There are no references for this article.