TY - JOUR
T1 - Analysis of Data Cleansing Methods for Improving Meteorological Data Quality: A Case Study
AU - Rahman, Md Geaur
AU - Khan, Md Akram Hossain
PY - 2025/1
Y1 - 2025/1
N2 - Quality in meteorological data is one of the main issues for many real applications including weather forecasting and for developing irrigation models. The integrity of meteorological data may be compromised for several reasons including the presence of corrupted and missing data which can be added due to interference and equipment malfunctioning. A decrease in data quality can significantly affect the efficiency of weather forecasting systems and irrigation models. Therefore, it is imperative to address the corrupt and missing data prior to their utilisation. In this study, we introduce a Data Cleansing Scheme (DCS) for handling the corrupt and missing values in a real meteorological dataset. DCS utilises a cutting-edge corrupt data identification method and a cutting-edge missing data imputation method to cleanse the meteorological data. The finalised dataset, free from any corrupt or missing values, is subsequently employed for data mining endeavours such as classification and knowledge discovery. Despite the negative impact of corrupt and missing values on the quality of data analysis results, this study demonstrates an enhancement when corrupt data is identified, and missing values are imputed using DCS. We also evaluate DCS on two publicly available datasets. Our extensive empirical and statistical analyses indicate the effectiveness of DCS for improving meteorological data quality.
AB - Quality in meteorological data is one of the main issues for many real applications including weather forecasting and for developing irrigation models. The integrity of meteorological data may be compromised for several reasons including the presence of corrupted and missing data which can be added due to interference and equipment malfunctioning. A decrease in data quality can significantly affect the efficiency of weather forecasting systems and irrigation models. Therefore, it is imperative to address the corrupt and missing data prior to their utilisation. In this study, we introduce a Data Cleansing Scheme (DCS) for handling the corrupt and missing values in a real meteorological dataset. DCS utilises a cutting-edge corrupt data identification method and a cutting-edge missing data imputation method to cleanse the meteorological data. The finalised dataset, free from any corrupt or missing values, is subsequently employed for data mining endeavours such as classification and knowledge discovery. Despite the negative impact of corrupt and missing values on the quality of data analysis results, this study demonstrates an enhancement when corrupt data is identified, and missing values are imputed using DCS. We also evaluate DCS on two publicly available datasets. Our extensive empirical and statistical analyses indicate the effectiveness of DCS for improving meteorological data quality.
UR - http://www.scopus.com/inward/record.url?scp=85211324082&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85211324082&partnerID=8YFLogxK
U2 - 10.1007/s12145-024-01608-9
DO - 10.1007/s12145-024-01608-9
M3 - Article
SN - 1865-0481
VL - 18
JO - Earth Science Informatics
JF - Earth Science Informatics
IS - 1
M1 - 8
ER -