Abstract
The rapid dissemination of misinformation, particularly myths about child development, poses significant risks to parental decision-making and child well-being. Parents often seek information online, which may include misleading myths. This thesis addresses this critical issue by developing advanced data science methodologies to differentiate between facts and myths, creating a systematic framework to classify, and analyse online child development information, demonstrating the potential of technology in combating misinformation.
The research aims to achieve three primary objectives: (1) to investigate the motivational pieces of literature to discover the research gap related to misinformation, (2) to establish a dataset categorising child development information into myths and facts, and (3) to design and evaluate robust machine learning (ML) and Bayesian models for accurate classification of the facts and myths information in the generated dataset. The study uniquely contributes to the domain by addressing gaps in misinformation research in child development by introducing novel categorisation and classification methodologies.
Child development facts and myths are rigorously analysed for the first time in this research, employing various data science techniques. The data, meticulously gathered from multiple websites, is then grouped into categories. It is then preprocessed through various text mining techniques, including cleaning, tokenisation, stemming, and lemmatisation, to prepare it for subsequent analysis. Sentiment and cluster analyses have uncovered critical distinctions between myths and facts, identifying patterns and providing deeper insights into the emotional and thematic structures of the data.
The research evaluated the performance of six ML models and one deep learning model using feature extraction methods such as Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). Key findings reveal that Logistic Regression with BoW achieved 90% accuracy, demonstrating classification efficiency. In contrast, KNN has the least accuracy (70%).
A comprehensive analysis of 42 Bayesian models using various vectorisation and embedding techniques was conducted to determine the most effective approach for text classification. This investigation identified the Multinomial Distribution-Based Bayesian Model (MDBM) as a robust classifier, achieving a significant improvement with the accuracy of 98.21%.
Additionally, 74 Bayesian models were analysed using various vectorisation and embedding techniques to identify the optimal approach for text classification, along with a hybrid model combining Naive Bayes (NB) and XGBoost. The proposed Multinomial Distribution-Based Bayesian Model Extended (MDBM-X) emerged as the most effective classifier, attaining 98.57% accuracy when fine-tuned with BoW and hyperparameter optimisation, outperforming all previous studies.
The models' performance was rigorously evaluated through cross-validation. The mean accuracy was 0.9569, with a 95% confidence interval of (0.9422, 0.9716). Bootstrapping yielded a comparable estimate of 0.9569 (0.9461 to 0.9683), reinforcing the robustness of the results. Furthermore, the model exhibited high computational efficiency, requiring only 1.92μs per statement during testing.
The study concludes with the findings that combining text mining with ML and Bayesian models effectively combats misinformation, enhances public understanding of child development, supports evidence-based parenting, and advances text classification research. This research not only contributes to the academic understanding of misinformation with contemporary methods but also provides practical tools for addressing real-world challenges, making it a valuable addition to the areas of child development and data science.
The research aims to achieve three primary objectives: (1) to investigate the motivational pieces of literature to discover the research gap related to misinformation, (2) to establish a dataset categorising child development information into myths and facts, and (3) to design and evaluate robust machine learning (ML) and Bayesian models for accurate classification of the facts and myths information in the generated dataset. The study uniquely contributes to the domain by addressing gaps in misinformation research in child development by introducing novel categorisation and classification methodologies.
Child development facts and myths are rigorously analysed for the first time in this research, employing various data science techniques. The data, meticulously gathered from multiple websites, is then grouped into categories. It is then preprocessed through various text mining techniques, including cleaning, tokenisation, stemming, and lemmatisation, to prepare it for subsequent analysis. Sentiment and cluster analyses have uncovered critical distinctions between myths and facts, identifying patterns and providing deeper insights into the emotional and thematic structures of the data.
The research evaluated the performance of six ML models and one deep learning model using feature extraction methods such as Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). Key findings reveal that Logistic Regression with BoW achieved 90% accuracy, demonstrating classification efficiency. In contrast, KNN has the least accuracy (70%).
A comprehensive analysis of 42 Bayesian models using various vectorisation and embedding techniques was conducted to determine the most effective approach for text classification. This investigation identified the Multinomial Distribution-Based Bayesian Model (MDBM) as a robust classifier, achieving a significant improvement with the accuracy of 98.21%.
Additionally, 74 Bayesian models were analysed using various vectorisation and embedding techniques to identify the optimal approach for text classification, along with a hybrid model combining Naive Bayes (NB) and XGBoost. The proposed Multinomial Distribution-Based Bayesian Model Extended (MDBM-X) emerged as the most effective classifier, attaining 98.57% accuracy when fine-tuned with BoW and hyperparameter optimisation, outperforming all previous studies.
The models' performance was rigorously evaluated through cross-validation. The mean accuracy was 0.9569, with a 95% confidence interval of (0.9422, 0.9716). Bootstrapping yielded a comparable estimate of 0.9569 (0.9461 to 0.9683), reinforcing the robustness of the results. Furthermore, the model exhibited high computational efficiency, requiring only 1.92μs per statement during testing.
The study concludes with the findings that combining text mining with ML and Bayesian models effectively combats misinformation, enhances public understanding of child development, supports evidence-based parenting, and advances text classification research. This research not only contributes to the academic understanding of misinformation with contemporary methods but also provides practical tools for addressing real-world challenges, making it a valuable addition to the areas of child development and data science.
| Original language | English |
|---|---|
| Qualification | Doctor of Philosophy |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Place of Publication | Australia |
| Publisher | |
| Publication status | Published - 2025 |