TY - CHAP
T1 - Review on analysis of the application areas and algorithms used in data wrangling in big data
AU - Bashya, Chiranjivi
AU - Halgamuge, Malka N.
AU - Mohammad, Azeem
N1 - Includes bibliographical references at the end of each chapter and index.
PY - 2018
Y1 - 2018
N2 - This study performed a content analysis of data retrieved from 30 peer-reviewed scientific publications (1996–2016) that describe the applied algorithm models for data wrangling in Big Data. This analysis method explores and evaluates applied algorithm models of data applications in the area of data wrangling methods in Big Data. Data wrangling unifies messy and complex data by a procedure of planning, which involves, clustering, and grouping of untidy and intricate sets of for easy access for the purposes of trending themes useful for business or company planning. This application of data wrangling is not only for business use, but also for the convenience of individuals, business users that consume data directly in reports, or schemes that further process data by streaming it into targets such as data warehouses, called data lakes. This method sets- up easy access and analysis of all untidy data. Data streaming procedure are exceptionally useful for planning, small and big businesses, all around the world who use data non-stop and constantly to produce emerging trends, structure and schemes that inadvertently makes a difference when sustaining and customising business by simply streaming data it into warehouses, or in other words data storage pools. This study analyzed and found that commonly used statistical figures and algorithms are used by major data application, however the information technology area certainly faces security challenges. However, Data wrangling algorithms used in different data applications such as medical data, textual data, financial data, topological data, governmental data, educational science, galaxy data, etc. could use clustering methods as it is much effective than others. This study has analyzed and found significant comparisons and contrasts between algorithms along with data applications and evaluated them to identify certain superior methods over others. Moreover, it shows that there is a significant use of medical data in the big data research area. Our results show that data wrangling when clustering algorithm can solve medical data storage issues by clustering algorithms. Similarly, clustering algorithms are frequently used for clustering data sets to analyze information from raw data. Fifty percent of the literature found that clustering algorithms for Data wrangling method is beneficial for algorithms used in different data applications to thoroughly analyze and evaluate their importance. After the analysis of Clustering algorithm, suggestions are made for applications used by medical data for the data wrangling purposes.
AB - This study performed a content analysis of data retrieved from 30 peer-reviewed scientific publications (1996–2016) that describe the applied algorithm models for data wrangling in Big Data. This analysis method explores and evaluates applied algorithm models of data applications in the area of data wrangling methods in Big Data. Data wrangling unifies messy and complex data by a procedure of planning, which involves, clustering, and grouping of untidy and intricate sets of for easy access for the purposes of trending themes useful for business or company planning. This application of data wrangling is not only for business use, but also for the convenience of individuals, business users that consume data directly in reports, or schemes that further process data by streaming it into targets such as data warehouses, called data lakes. This method sets- up easy access and analysis of all untidy data. Data streaming procedure are exceptionally useful for planning, small and big businesses, all around the world who use data non-stop and constantly to produce emerging trends, structure and schemes that inadvertently makes a difference when sustaining and customising business by simply streaming data it into warehouses, or in other words data storage pools. This study analyzed and found that commonly used statistical figures and algorithms are used by major data application, however the information technology area certainly faces security challenges. However, Data wrangling algorithms used in different data applications such as medical data, textual data, financial data, topological data, governmental data, educational science, galaxy data, etc. could use clustering methods as it is much effective than others. This study has analyzed and found significant comparisons and contrasts between algorithms along with data applications and evaluated them to identify certain superior methods over others. Moreover, it shows that there is a significant use of medical data in the big data research area. Our results show that data wrangling when clustering algorithm can solve medical data storage issues by clustering algorithms. Similarly, clustering algorithms are frequently used for clustering data sets to analyze information from raw data. Fifty percent of the literature found that clustering algorithms for Data wrangling method is beneficial for algorithms used in different data applications to thoroughly analyze and evaluate their importance. After the analysis of Clustering algorithm, suggestions are made for applications used by medical data for the data wrangling purposes.
KW - Algorithms
KW - Big data
KW - Clustering
KW - Data application
KW - Data wrangling
KW - Decision tree
KW - Financial data
KW - Medical data
UR - http://www.scopus.com/inward/record.url?scp=85090370835&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090370835&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-70688-7_14
DO - 10.1007/978-3-319-70688-7_14
M3 - Chapter (peer-reviewed)
AN - SCOPUS:85090370835
SN - 9783319706870
T3 - Lecture Notes on Data Engineering and Communications Technologies
SP - 337
EP - 353
BT - Cognitive computing for big data systems over IoT
A2 - Sangaiah, Arun Kumar
A2 - Thangavelu, Arunkumar
A2 - Sundaram, Venkatesan Meenakshi
PB - Springer
CY - Cham
ER -