TY - JOUR
T1 - Assessing optimization techniques for improving water quality model
AU - Uddin, Md Galal
AU - Nash, Stephen
AU - Rahman, Azizur
AU - Olbert, Agnieszka I.
N1 - Funding Information:
Eighteen FS techniques including (i) Univariate filter method (unsupervised): F-Test, Chi-square (χ2), mutual information (MI), Pearson correlation (PCOR), K_best and features transformation based principal component analysis (PCA) and multivariate based minimum redundancy maximum relevance (MRMR); (ii) filter (supervised): Relief, and Laplacian score (LAP), (iii) wrapper methods (supervised): recursive feature elimination: A support vector machine (RFE_SVM) and recursive feature elimination-random forest (RFE_RF); and (iv) embedded based (supervised): extrem gradient boosting (XGBoost), random forest (RF), linear regression (LR), Boruta, least absolute shrinkage and selection operator (LASSO), intransic based extra tree algorithm (ExT) and step-wise generalized linear regression model (SWGLM) and (v) maximum voting approach (MAXVOT) were utilized in this research. The details of each FS method and their functions can be found in supplemental materials as a continuation of 2.3.1.In this study, the NSE and MEF computed to evaluate the model efficiency. Fig. 8 shows the results of NSE and MEF for various subsets of water quality indicators. Based on the results, larger NSE and smaller MEF values were calculated for the subset S13 (Fig. 8), although other statistical metrics are not supported for this subset. Most metrics were found to be larger through the training (RMSE = 1.37, MSE = 5.34, MAE = 1.88, MAPE = 2.76, AIC = 59.61, and BIC = 64), whereas model performance was improved significantly due to the underfitting problem during testing (RMSE = 0.43, MSE = 0.05, MAE = 0.18, AIC = −88, and BIC = −83.6) periods (Table 4). Like subset S13, the larger NSE (0.84) and the smaller MEF (0.40) were also calculated for subset S4 (Fig. 8). Compared to other metrics, all the statistical evidence shows that the subset S4 is the best for predicting the WQI values than other subsets (Table 4; Table 6).The authors gratefully acknowledge the editor's and anonymous reviewers' contributions to the improvement of this paper. This research was funded by the Hardiman Research Scholarship of the University of Galway, which funded the first author as part of his PhD program. The authors would like to acknowledge support from MaREI, the SFI Research Centre for Energy, Climate, and Marine research. The authors would like to thank the Environmental Protection Agency of Ireland for providing water quality data. The authors also sincerely acknowledge Charles Sturt University for providing all necessary supports to this PhD project through the international co-supervision. The first author would like to sincerely thank Professor Azizur Rahman for his outstanding supervision support and methodological contributions in the PhD project. Moreover, the authors also sincerely acknowledge the Eco HydroInformatics Research Group (EHIRG), School of Engineering, College of Science and Engineering, University of Galway, Ireland for providing computational laboratory facilities to complete this research.
Funding Information:
The authors gratefully acknowledge the editor's and anonymous reviewers' contributions to the improvement of this paper. This research was funded by the Hardiman Research Scholarship of the University of Galway, which funded the first author as part of his PhD program. The authors would like to acknowledge support from MaREI, the SFI Research Centre for Energy, Climate, and Marine research. The authors would like to thank the Environmental Protection Agency of Ireland for providing water quality data. The authors also sincerely acknowledge Charles Sturt University for providing all necessary supports to this PhD project through the international co-supervision. The first author would like to sincerely thank Professor Azizur Rahman for his outstanding supervision support and methodological contributions in the PhD project. Moreover, the authors also sincerely acknowledge the Eco HydroInformatics Research Group (EHIRG), School of Engineering, College of Science and Engineering, University of Galway, Ireland for providing computational laboratory facilities to complete this research.
Publisher Copyright:
© 2022 The Authors
PY - 2023/1/20
Y1 - 2023/1/20
N2 - In order to keep the “good” status of coastal water quality, it is essential to monitor and assess frequently. The Water quality index (WQI) model is one of the most widely used techniques for the assessment of water quality. It consists of five components, with the indicator selection technique being one of the more crucial components. Several studies conducted recently have shown that the use of the existing techniques results in a significant amount of uncertainty being produced in the final assessment due to the inappropriate indicator selection. The present study carried out a comprehensive assessment of various features selection (FS) techniques for selecting crucial coastal water quality indicators in order to develop an efficient WQI model. This study aims to analyse the effects of eighteen different FS techniques, including (i) nine filter methods, (ii) two wrapper methods, and (iii) seven embedded methods for the comparison of model performance of the WQI. In total, fifteen combinations (subsets) of water quality indicators were constructed, and WQI values were calculated for each combination using the improvement methodology for coastal water quality. The WQI model's performance was tested using nine machine-learning algorithms, which validated the model's performance using various metrics. The results indicated that the tree-based random forest algorithm could be effective for selecting crucial water quality indicators in terms of assessing coastal water. Deep neural network algorithm showed better performance for predicting coastal water quality more accurately incorporating the subset of the random forest.
AB - In order to keep the “good” status of coastal water quality, it is essential to monitor and assess frequently. The Water quality index (WQI) model is one of the most widely used techniques for the assessment of water quality. It consists of five components, with the indicator selection technique being one of the more crucial components. Several studies conducted recently have shown that the use of the existing techniques results in a significant amount of uncertainty being produced in the final assessment due to the inappropriate indicator selection. The present study carried out a comprehensive assessment of various features selection (FS) techniques for selecting crucial coastal water quality indicators in order to develop an efficient WQI model. This study aims to analyse the effects of eighteen different FS techniques, including (i) nine filter methods, (ii) two wrapper methods, and (iii) seven embedded methods for the comparison of model performance of the WQI. In total, fifteen combinations (subsets) of water quality indicators were constructed, and WQI values were calculated for each combination using the improvement methodology for coastal water quality. The WQI model's performance was tested using nine machine-learning algorithms, which validated the model's performance using various metrics. The results indicated that the tree-based random forest algorithm could be effective for selecting crucial water quality indicators in terms of assessing coastal water. Deep neural network algorithm showed better performance for predicting coastal water quality more accurately incorporating the subset of the random forest.
KW - Artificial intelligence
KW - Coastal water quality
KW - Feature selection algorithms
KW - Machine-learning techniques
KW - Water quality index model
UR - http://www.scopus.com/inward/record.url?scp=85144577155&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144577155&partnerID=8YFLogxK
U2 - 10.1016/j.jclepro.2022.135671
DO - 10.1016/j.jclepro.2022.135671
M3 - Article
AN - SCOPUS:85144577155
SN - 0959-6526
VL - 385
JO - Journal of Cleaner Production
JF - Journal of Cleaner Production
M1 - 135671
ER -