Diagnostic with incomplete nominal/discrete data

Herbert Jelinek, Andrew Yatsko, Andrew Stranieri, Sitalakshmi Venkatraman, Adil Bagirov

Research output: Contribution to journalArticle

3 Downloads (Pure)

Abstract

Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise applicationof readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown.Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, nospecial handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation.Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour,and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the enteredmissing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classi-fication. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents anumber of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing downof the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. Theproposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating asignificant improvement.
Original languageEnglish
Pages (from-to)22-35
Number of pages14
JournalAritificial Intelligence Research
Volume4
Issue number1
DOIs
Publication statusPublished - Jan 2015

Fingerprint

Classifiers
Decision trees
Testing

Cite this

Jelinek, H., Yatsko, A., Stranieri, A., Venkatraman, S., & Bagirov, A. (2015). Diagnostic with incomplete nominal/discrete data. Aritificial Intelligence Research, 4(1), 22-35. https://doi.org/10.5430/air.v4n1p22
Jelinek, Herbert ; Yatsko, Andrew ; Stranieri, Andrew ; Venkatraman, Sitalakshmi ; Bagirov, Adil. / Diagnostic with incomplete nominal/discrete data. In: Aritificial Intelligence Research. 2015 ; Vol. 4, No. 1. pp. 22-35.
@article{be62f6f5acc14aa5a11824f8e180c7c5,
title = "Diagnostic with incomplete nominal/discrete data",
abstract = "Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise applicationof readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown.Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, nospecial handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation.Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour,and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the enteredmissing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classi-fication. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents anumber of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing downof the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. Theproposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating asignificant improvement.",
keywords = "Open access version available, Categorical data, Classification, Continuous features, Discretization, Missing values",
author = "Herbert Jelinek and Andrew Yatsko and Andrew Stranieri and Sitalakshmi Venkatraman and Adil Bagirov",
note = "Imported on 12 Apr 2017 - DigiTool details were: month (773h) = January; Journal title (773t) = Aritificial Intelligence Research. ISSNs: 1927-6974;",
year = "2015",
month = "1",
doi = "10.5430/air.v4n1p22",
language = "English",
volume = "4",
pages = "22--35",
journal = "Aritificial Intelligence Research",
issn = "1927-6974",
publisher = "Sciedu Press",
number = "1",

}

Jelinek, H, Yatsko, A, Stranieri, A, Venkatraman, S & Bagirov, A 2015, 'Diagnostic with incomplete nominal/discrete data', Aritificial Intelligence Research, vol. 4, no. 1, pp. 22-35. https://doi.org/10.5430/air.v4n1p22

Diagnostic with incomplete nominal/discrete data. / Jelinek, Herbert; Yatsko, Andrew; Stranieri, Andrew; Venkatraman, Sitalakshmi; Bagirov, Adil.

In: Aritificial Intelligence Research, Vol. 4, No. 1, 01.2015, p. 22-35.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Diagnostic with incomplete nominal/discrete data

AU - Jelinek, Herbert

AU - Yatsko, Andrew

AU - Stranieri, Andrew

AU - Venkatraman, Sitalakshmi

AU - Bagirov, Adil

N1 - Imported on 12 Apr 2017 - DigiTool details were: month (773h) = January; Journal title (773t) = Aritificial Intelligence Research. ISSNs: 1927-6974;

PY - 2015/1

Y1 - 2015/1

N2 - Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise applicationof readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown.Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, nospecial handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation.Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour,and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the enteredmissing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classi-fication. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents anumber of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing downof the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. Theproposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating asignificant improvement.

AB - Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise applicationof readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown.Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, nospecial handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation.Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour,and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the enteredmissing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classi-fication. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents anumber of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing downof the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. Theproposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating asignificant improvement.

KW - Open access version available

KW - Categorical data

KW - Classification

KW - Continuous features

KW - Discretization

KW - Missing values

U2 - 10.5430/air.v4n1p22

DO - 10.5430/air.v4n1p22

M3 - Article

VL - 4

SP - 22

EP - 35

JO - Aritificial Intelligence Research

JF - Aritificial Intelligence Research

SN - 1927-6974

IS - 1

ER -