Diagnostic with incomplete nominal/discrete data

Herbert F. Jelinek, Andrew Yatsko, Andrew Stranieri, Sitalakshmi Venkatraman, Adil Bagirov

Research output: Contribution to journalArticlepeer-review

50 Downloads (Pure)

Abstract

Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise applicationof readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown.Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, nospecial handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation.Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour,and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the enteredmissing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classi-fication. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents anumber of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing downof the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. Theproposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating asignificant improvement.
Original languageEnglish
Pages (from-to)22-35
Number of pages14
JournalAritificial Intelligence Research
Volume4
Issue number1
DOIs
Publication statusPublished - Jan 2015

Fingerprint

Dive into the research topics of 'Diagnostic with incomplete nominal/discrete data'. Together they form a unique fingerprint.

Cite this