TY - JOUR
T1 - Enhanced Clustering of Biomedical Documents Using Ensemble Non-negative Matrix Factorization
AU - Huang, Xiaodi
AU - Zheng, Xiaodong
AU - Yuan, Wei
AU - Wang, Fei
AU - Zhu, Shanfeng
N1 - Imported on 24 Apr 2017 - DigiTool details were: 086 FoR could not be migrated (890205 - ). publisher (260b) = Elsevier; month (773h) = June, 2011; Journal title (773t) = Information Sciences. ISSNs: 0020-0255;
PY - 2011/6
Y1 - 2011/6
N2 - Searching and mining biomedical literature databases are common ways of generating scientific hypotheses by biomedical researchers. Clustering can assist researchers to form hypotheses by seeking valuable information from grouped documents effectively. Although a large number of clustering algorithms are available, this paper attempts to answer the question as to which algorithm is best suited to accurately cluster biomedical documents. Non-negative matrix factorization (NMF) has been widely applied to clustering general text documents. However, the clustering results are sensitive to the initial values of the parameters of NMF. In order to overcome this drawback, we present the ensemble NMF for clustering biomedical documents in this paper. The performance of ensemble NMF was evaluated on numerous datasets generated from the TREC Genomics track dataset. With respect to most datasets, the experimental results have demonstrated that the ensemble NMF significantly outperforms classical clustering algorithms of bisecting K-means, and hierarchical clustering. We compared four different methods for constructing an ensemble NMF. For clustering biomedical documents, this research is the first to compare ensemble NMF with typical classical clustering algorithms, and validates ensemble NMF constructed from different graph-based ensemble algorithms. This is also the first work on ensemble NMF with Hybrid Bipartite Graph Formulation for clustering biomedical documents.
AB - Searching and mining biomedical literature databases are common ways of generating scientific hypotheses by biomedical researchers. Clustering can assist researchers to form hypotheses by seeking valuable information from grouped documents effectively. Although a large number of clustering algorithms are available, this paper attempts to answer the question as to which algorithm is best suited to accurately cluster biomedical documents. Non-negative matrix factorization (NMF) has been widely applied to clustering general text documents. However, the clustering results are sensitive to the initial values of the parameters of NMF. In order to overcome this drawback, we present the ensemble NMF for clustering biomedical documents in this paper. The performance of ensemble NMF was evaluated on numerous datasets generated from the TREC Genomics track dataset. With respect to most datasets, the experimental results have demonstrated that the ensemble NMF significantly outperforms classical clustering algorithms of bisecting K-means, and hierarchical clustering. We compared four different methods for constructing an ensemble NMF. For clustering biomedical documents, this research is the first to compare ensemble NMF with typical classical clustering algorithms, and validates ensemble NMF constructed from different graph-based ensemble algorithms. This is also the first work on ensemble NMF with Hybrid Bipartite Graph Formulation for clustering biomedical documents.
KW - Biomedical document clustering
KW - Ensemble clustering
KW - Non-negative matrix factorization
U2 - 10.1016/j.ins.2011.01.029
DO - 10.1016/j.ins.2011.01.029
M3 - Article
SN - 0020-0255
VL - 181
SP - 2293
EP - 2302
JO - Information Sciences
JF - Information Sciences
IS - 11
ER -