Abstract
The ability to extract knowledge from data has been the driving force of Data Mining since its inception, and of statistical modeling long before even that. Actionable knowledge often takes the form of patterns, otherwise known as rules, where a set of antecedents can be used to infer a consequent. In this paper we offer a solution to the problem of comparing different sets of rules. Our solution allows comparisons between rule lists that were derived from different techniques (such as different classification algorithms), or made from different samples of data (such as temporal data or data perturbed for privacy reasons). We propose using the Jaccard Index to measure the similarity between rule lists, by converting each rule into a single element within the set of rules. Our measure focuses on providing conceptual simplicity, computational simplicity, interpretability, and wide applicability. The results of this measure are compared to Prediction Accuracy in the context of a real-world data mining scenario.
Original language | English |
---|---|
Title of host publication | Proceedings of the 14th Australasian Data Mining Conference (AusDM 16) |
Place of Publication | Australia |
Publisher | CRPIT |
Pages | 1-8 |
Number of pages | 8 |
Publication status | Published - 2016 |
Event | The 14th Australasian Data Mining Conference: AusDM 2016 - Realm Hotel, Canberra, Australia Duration: 06 Dec 2016 → 08 Dec 2016 http://ausdm16.ausdm.org/ |
Conference
Conference | The 14th Australasian Data Mining Conference |
---|---|
Country/Territory | Australia |
City | Canberra |
Period | 06/12/16 → 08/12/16 |
Internet address |