Decision Tree Classification with Differential Privacy: A Survey

Sam Fletcher, Zahid Islam

Research output: Contribution to journalArticle

Abstract

Data mining information about people is becoming increasingly important in the data-driven society of the21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals ofdata mining; sometimes the privacy of the people being data mined needs to be considered. This necessitatesthat the output of data mining algorithms be modified to preserve privacy while simultaneously not ruiningthe predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacythat can be used in data mining algorithms, guaranteeing that nothing will be learned about the people inthe data that could not already be discovered without their participation. In this survey, we focus on oneparticular data mining algorithm – decision trees – and how differential privacy interacts with each of thecomponents that constitute decision tree algorithms. We analyze both greedy and random decision trees,and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.
Original languageEnglish
JournalACM Computing Surveys
Publication statusAccepted/In press - 21 May 2019

Fingerprint

Decision trees
Decision tree
Privacy
Data mining
Data Mining
Tree Algorithms
Data-driven
Mining
Output
Requirements
Model
Conflict

Cite this

@article{0aee705805ef421bb883733d114d29b3,
title = "Decision Tree Classification with Differential Privacy: A Survey",
abstract = "Data mining information about people is becoming increasingly important in the data-driven society of the21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals ofdata mining; sometimes the privacy of the people being data mined needs to be considered. This necessitatesthat the output of data mining algorithms be modified to preserve privacy while simultaneously not ruiningthe predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacythat can be used in data mining algorithms, guaranteeing that nothing will be learned about the people inthe data that could not already be discovered without their participation. In this survey, we focus on oneparticular data mining algorithm – decision trees – and how differential privacy interacts with each of thecomponents that constitute decision tree algorithms. We analyze both greedy and random decision trees,and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.",
author = "Sam Fletcher and Zahid Islam",
year = "2019",
month = "5",
day = "21",
language = "English",
journal = "ACM Computing Surveys",
issn = "0360-0300",
publisher = "Association for Computing Machinery (ACM)",

}

Decision Tree Classification with Differential Privacy: A Survey. / Fletcher, Sam; Islam, Zahid.

In: ACM Computing Surveys, 21.05.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Decision Tree Classification with Differential Privacy: A Survey

AU - Fletcher, Sam

AU - Islam, Zahid

PY - 2019/5/21

Y1 - 2019/5/21

N2 - Data mining information about people is becoming increasingly important in the data-driven society of the21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals ofdata mining; sometimes the privacy of the people being data mined needs to be considered. This necessitatesthat the output of data mining algorithms be modified to preserve privacy while simultaneously not ruiningthe predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacythat can be used in data mining algorithms, guaranteeing that nothing will be learned about the people inthe data that could not already be discovered without their participation. In this survey, we focus on oneparticular data mining algorithm – decision trees – and how differential privacy interacts with each of thecomponents that constitute decision tree algorithms. We analyze both greedy and random decision trees,and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.

AB - Data mining information about people is becoming increasingly important in the data-driven society of the21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals ofdata mining; sometimes the privacy of the people being data mined needs to be considered. This necessitatesthat the output of data mining algorithms be modified to preserve privacy while simultaneously not ruiningthe predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacythat can be used in data mining algorithms, guaranteeing that nothing will be learned about the people inthe data that could not already be discovered without their participation. In this survey, we focus on oneparticular data mining algorithm – decision trees – and how differential privacy interacts with each of thecomponents that constitute decision tree algorithms. We analyze both greedy and random decision trees,and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.

UR - https://www.researchgate.net/publication/309738753_Decision_Tree_Classification_with_Differential_Privacy_A_Survey

M3 - Article

JO - ACM Computing Surveys

JF - ACM Computing Surveys

SN - 0360-0300

ER -