Review on analysis of the application areas and algorithms used in data wrangling in big data

Chiranjivi Bashya, Malka N. Halgamuge, Azeem Mohammad

Research output: Book chapter/Published conference paperChapter (peer-reviewed)peer-review


This study performed a content analysis of data retrieved from 30 peer-reviewed scientific publications (1996–2016) that describe the applied algorithm models for data wrangling in Big Data. This analysis method explores and evaluates applied algorithm models of data applications in the area of data wrangling methods in Big Data. Data wrangling unifies messy and complex data by a procedure of planning, which involves, clustering, and grouping of untidy and intricate sets of for easy access for the purposes of trending themes useful for business or company planning. This application of data wrangling is not only for business use, but also for the convenience of individuals, business users that consume data directly in reports, or schemes that further process data by streaming it into targets such as data warehouses, called data lakes. This method sets- up easy access and analysis of all untidy data. Data streaming procedure are exceptionally useful for planning, small and big businesses, all around the world who use data non-stop and constantly to produce emerging trends, structure and schemes that inadvertently makes a difference when sustaining and customising business by simply streaming data it into warehouses, or in other words data storage pools. This study analyzed and found that commonly used statistical figures and algorithms are used by major data application, however the information technology area certainly faces security challenges. However, Data wrangling algorithms used in different data applications such as medical data, textual data, financial data, topological data, governmental data, educational science, galaxy data, etc. could use clustering methods as it is much effective than others. This study has analyzed and found significant comparisons and contrasts between algorithms along with data applications and evaluated them to identify certain superior methods over others. Moreover, it shows that there is a significant use of medical data in the big data research area. Our results show that data wrangling when clustering algorithm can solve medical data storage issues by clustering algorithms. Similarly, clustering algorithms are frequently used for clustering data sets to analyze information from raw data. Fifty percent of the literature found that clustering algorithms for Data wrangling method is beneficial for algorithms used in different data applications to thoroughly analyze and evaluate their importance. After the analysis of Clustering algorithm, suggestions are made for applications used by medical data for the data wrangling purposes.
Original languageEnglish
Title of host publicationCognitive computing for big data systems over IoT
Subtitle of host publicationFrameworks, tools and applications
EditorsArun Kumar Sangaiah, Arunkumar Thangavelu, Venkatesan Meenakshi Sundaram
Place of PublicationCham
Number of pages17
ISBN (Electronic)9783319706887
ISBN (Print)9783319706870
Publication statusPublished - 2018

Publication series

NameLecture Notes on Data Engineering and Communications Technologies
ISSN (Print)2367-4512
ISSN (Electronic)2367-4520


Dive into the research topics of 'Review on analysis of the application areas and algorithms used in data wrangling in big data'. Together they form a unique fingerprint.

Cite this