DETECTIVE: A Decision Tree Based Categorical Value Clustering and Perturbation Technique in Privacy Preserving Data Mining

Md Zahidul Islam, Ljiljana Brankovic

Research output: Book chapter/Published conference paperConference paperpeer-review

16 Citations (Scopus)
68 Downloads (Pure)


Data Mining is a powerful tool for information discovery from huge datasets. Various sectors, including com-mercial, government, financial, medical, and scientific, are ap-plying Data Mining techniques on their datasets that typically contain sensitive individual information. During this process the datasets get exposed to several parties, which can poten-tially lead to disclosure of sensitive information and thus to breaches of privacy. Several Data Mining privacy preserving techniques have been recently proposed. In this paper we focus on data pertur-bation techniques, i.e., those that add noise to the data in order to prevent exact disclosure of confidential values. Some of these techniques were designed for datasets having only nu-merical non-class attributes and a categorical class attribute. However, natural datasets are more likely to have both nu-merical and categorical non-class attributes, and occasionally they contain only categorical attributes. Noise addition tech-niques developed for numerical attributes are not suitable for such datasets, due to the absence of natural ordering among categorical values. In this paper we propose a new method for adding noise to categorical values, which makes use of the clusters that exist among these values. We first discuss several existing categorical clustering methods and point out the limi-tations they exhibit in our context. Then we present a novel approach towards clustering of categorical values and use it to perturb data while maintaining the patterns in the dataset. Our clustering approach partitions the values of a given cate-gorical attribute rather than the records of the datasets; addi-tionally, our approach operates on the horizontally partitioned dataset and it is possible for two values to belong to the same cluster in one horizontal partition of the dataset, and to two distinct clusters in another partition.Finally, we provide some experimental results in order to evaluate our perturbation technique and to compare our clustering approach with an existing method, the so-called CACTUS.
Original languageEnglish
Title of host publicationIEEE International Conference on Industrial Informatics (INDIN)
Place of PublicationUSA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)0780390946
Publication statusPublished - 2005
Event3rd International Conference - Perth, WA Australia, Australia
Duration: 10 Aug 200512 Aug 2005


Conference3rd International Conference


Dive into the research topics of 'DETECTIVE: A Decision Tree Based Categorical Value Clustering and Perturbation Technique in Privacy Preserving Data Mining'. Together they form a unique fingerprint.

Cite this