TY - JOUR
T1 - PredNTS
T2 - Improved and robust prediction of nitrotyrosine sites by integrating multiple sequence features
AU - Nilamyani, Andi Nur
AU - Auliah, Firda Nurul
AU - Moni, Mohammad Ali
AU - Shoombuatong, Watshara
AU - Hasan, Md Mehedi
AU - Kurata, Hiroyuki
N1 - Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/3/1
Y1 - 2021/3/1
N2 - Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyro-sine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.
AB - Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyro-sine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.
KW - Feature encoding
KW - Machine learning
KW - Nitrotyrosine
KW - Post-translational modification
KW - RFE feature selection
UR - http://www.scopus.com/inward/record.url?scp=85102048754&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102048754&partnerID=8YFLogxK
U2 - 10.3390/ijms22052704
DO - 10.3390/ijms22052704
M3 - Article
C2 - 33800121
AN - SCOPUS:85102048754
SN - 1422-0067
VL - 22
SP - 1
EP - 11
JO - International Journal of Molecular Sciences
JF - International Journal of Molecular Sciences
IS - 5
M1 - 2704
ER -