Abstract

The ability to extract knowledge from data has been the drivingforce of Data Mining since its inception, and of statistical modelinglong before even that. Actionable knowledge often takes the formof patterns, where a set of antecedents can be used to infer a consequent. In this paper we offer a solution to the problem of comparing different sets of patterns. Our solution allows comparisons between sets of patterns that were derived from different techniques (such as different classification algorithms), or made from different samples of data (such as temporal data or data perturbed for privacy reasons). We propose using the Jaccard index to measure the similarity between sets of patterns by converting each pattern into a single element within the set. Our measure focuses on providing conceptual simplicity, computational simplicity, interpretability, and wide applicability. The results of this measure are compared to prediction accuracy in the context of a real-world data mining scenario.
Original languageEnglish
Pages (from-to)1-17
Number of pages17
JournalAustralasian Journal of Information Systems
Volume22
DOIs
Publication statusPublished - 2018

Fingerprint

Dive into the research topics of 'Comparing sets of patterns with the Jaccard index'. Together they form a unique fingerprint.

Cite this