Abstract
With the ubiquity of data collection in today’s society, protecting each individual’s privacy is a growing concern. Differential Privacy provides an enforceable definition of privacy that allows data owners to promise each individual that their presence in the dataset will be almost undetectable. Data Mining techniques are often used to discover knowledge in data, however these techniques are not differentially privacy by default. In this paper, we propose a differentially private decision forest algorithm that takes advantage of a novel theorem for the local sensitivity of the Gini Index. The Gini Index plays an important role in building a decision forest, and the sensitivity of it’s equation dictates how much noise needs to be added to make the forest be differentially private. We prove that the Gini Index can have a substantially lower sensitivity than that used in previous work, leading to superior empirical results. We compare the prediction accuracy of our decision forest to not only previous work, but also to the popular Random Forest algorithm to demonstrate how close our differentially private algorithm can come to a completely non-private forest.
Original language | English |
---|---|
Title of host publication | Proceedings of the Thirteenth Australasian Data Mining Conference (AusDM 15) |
Place of Publication | Australia |
Publisher | CRPIT |
Pages | 99-108 |
Number of pages | 10 |
ISBN (Print) | 9781921770180 |
Publication status | Published - 2015 |
Event | The 13th Australasian Data Mining Conference: AusDM 2015 - University of Technology, Sydney, Australia Duration: 08 Aug 2015 → 09 Aug 2015 https://web.archive.org/web/20150820140652/http://ausdm15.ausdm.org/ |
Conference
Conference | The 13th Australasian Data Mining Conference |
---|---|
Country/Territory | Australia |
City | Sydney |
Period | 08/08/15 → 09/08/15 |
Internet address |