Novel data mining techniques for incomplete clinical data in Diabetes management

Herbert F. Jelinek, Andrew Yatsko, Andrew Stranieri, Sitalakshmi Venkatraman

Research output: Contribution to journalArticlepeer-review

46 Downloads (Pure)


An important part of health care involves upkeep and interpretation of medical databases containing patient records for clinical decision making, diagnosis and follow-up treatment. Missing clinical entries make it difficult to apply data mining algorithms for clinical decision support. This study demonstrates that higher predictive accuracy is possible using conventional data mining algorithms if missing values are dealt with appropriately. We propose a novel algorithm using a convolution of sub-problems to stage a super problem, where classes are defined by Cartesian Product of class values of the underlying problems, and Incomplete Information Dismissal and Data Completion techniques are applied for reducing features and imputing missing values. Predictive accuracies using Decision Branch, Nearest Neighborhood and Naïve Bayesian classifiers were compared to predict diabetes, cardiovascular disease and hypertension. Data is derived from Diabetes Screening Complications Research Initiative (DiScRi) conducted at a regional Australian university involving more than 2400 patient records with more than one hundred clinical risk factors (attributes). The results show substantial improvements in the accuracy achieved with each classifier for an effective diagnosis of diabetes, cardiovascular disease and hypertension as compared to those achieved without substituting missing values. The gain in improvement is 7% for diabetes, 21% for cardiovascular disease and 24% for hypertension, and our integrated novel approach has resulted in more than 90% accuracy for the diagnosis of any of the three conditions. This work advances data mining research towards achieving an integrated and holistic management of diabetes.
Original languageEnglish
Pages (from-to)4591-4606
Number of pages16
JournalBritish Journal of Applied Science and Technology
Issue number33
Publication statusPublished - 2014


Dive into the research topics of 'Novel data mining techniques for incomplete clinical data in Diabetes management'. Together they form a unique fingerprint.

Cite this