Gene classification and pattern extraction from gene sequence data is essential in understanding different gene sequence features. The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing micro-array analysis with data and knowledge from diverse available sources. Since then, it has been used for various science fields, including the discovery of new drugs, identification of protein coded genes by analyzing and separating exons from the main sequence, phenotype prediction based on gene expression. This paper presents an application of gene classification from gene sequence data using data mining and machine learning techniques. Our research’s main goal is to compare different machine learning approaches based on time of execution, and overall efficiency by testing them on different micro-array datasets of gene sequence and determining the best approach for gene classification. Eight different machine learning techniques have been tested on eleven different gene expression datasets. We also apply feature selection method before we apply classification techniques on the gene expression datasets. The experimental results show that feature selection method improve the performance of the techniques on the gene expression datasets. Moreover, we perform pattern analysis on some gene expression datasets using J48 decision tree outcome.
|Title of host publication||2021 IEEE International Conference on Machine Learning and Cybernetics (ICMLC)|
|Publication status||Accepted/In press - 2022|