Data Mining and Privacy: Modeling Sensitive Data with Differential Privacy

Samuel Fletcher

Research output: ThesisDoctoral Thesis

294 Downloads (Pure)


In the data-driven society of the 21st century, mining data to discover information about people is becoming increasingly valuable. The information can be used to learn more about society and humanity, or to build models that enable us to predict future events. Applications of data mining range from commercial endeavors, to contributing to the common good through demographic and medical studies. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to protect sensitive information, while simultaneously not ruining the informative or predictive power of the outputted model.

Many techniques have been developed to preserve privacy over the years, but one stands out above the rest: differential privacy. Differential privacy is an enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their personal information.

In this thesis, we focus on one particular data mining algorithm - decision trees - and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze the conflicts that arise when balancing privacy requirements with the utility of a model. We view "utility" as a two-sided coin; on one side there is prediction accuracy, and on the other there is knowledge discovery. Optimal results for both sides cannot be achieved at the same time, and the importance of each side is dependent on the user's needs. We explore the trade-offs that need to be made when prioritizing one side over the other.
Original languageEnglish
QualificationDoctor of Philosophy
  • Islam, Zahid, Principal Supervisor
  • Burmeister, Oliver, Co-Supervisor
Award date01 Jun 2017
Publication statusPublished - 2017

Fingerprint Dive into the research topics of 'Data Mining and Privacy: Modeling Sensitive Data with Differential Privacy'. Together they form a unique fingerprint.

Cite this