Towards a parallel data mining toolbox

P. Christen, M. Hegland, O. M. Nielsen, Stephen Roberts, P. Strazdins, T. Semenova, Irfan Altas, Timothy Hancock

Research output: Book chapter/Published conference paperConference paperpeer-review

2 Citations (Scopus)

Abstract

This paper presents research projects tackling two aspects in data mining. First, a toolbox is discussed that allows flexible and interactive data exploration, analysis and presentation using the scripting language Python. The advantages of this toolbox are that it provides the functionality to process multiple SQL queries in parallel, and enables fast data retrieval using a supervised caching mechanism for commonly used queries. These two facets of the toolbox allow for fast, efficient data access reducing the time spent on data exploration, preparation and analysis. Secondly, an approach to predictive modelling is presented that leads to scalable parallel algorithms for high dimensional data collections. This is an essential requirement for data mining algorithms as those that do not scale linearly with the data size are infeasible. These algorithms are implemented in parallel and achieve an almost ideal speedup for their respective implementations. One aim of the presented research is to integrate and combine these two different aspects of data mining into an efficient but flexible data mining toolbox that allows the experienced data miner to attack large scale problems interactively or with batch processing.

Original languageEnglish
Title of host publicationProceedings - 15th International Parallel and Distributed Processing Symposium, IPDPS 2001
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1563-1570
Number of pages8
ISBN (Electronic)0769509908, 9780769509907
DOIs
Publication statusPublished - 2001
Event15th International Parallel and Distributed Processing Symposium, IPDPS 2001 - San Francisco, United States
Duration: 23 Apr 200127 Apr 2001

Publication series

NameProceedings - 15th International Parallel and Distributed Processing Symposium, IPDPS 2001

Conference

Conference15th International Parallel and Distributed Processing Symposium, IPDPS 2001
CountryUnited States
CitySan Francisco
Period23/04/0127/04/01

Fingerprint Dive into the research topics of 'Towards a parallel data mining toolbox'. Together they form a unique fingerprint.

Cite this