Document Details

Document Type : Thesis 
Document Title :
BIG DATA MANAGEMENT AND ANALYSIS FRAMEWORK
اطار لإدارة وتحليل قواعد البيانات العملاقة
 
Subject : Faculty of Computing and Information Technology 
Document Language : Arabic 
Abstract : Data digitization, individuals and businesses are generating tremendous amount of data on daily basis all around the world. Digitization, the conversion of ana- logue data to digital data, facilitates launching many digitization projects such as Google Books Library Project in which millions of books were scanned and stored as an electronic library. Billions of mobile phones, tablets and laptops equipped with sensors such as cameras and running social media applications while connected to the Internet generates a huge amount of data. Moreover, busi- nesses and organizations generate huge amount of transactional data and collect millions of megabytes data about their customers, suppliers, and products. All these data sources and many others build up the big data phenomena. Big data is characterized by the huge volume, diverse data structures and rapid change; exist- ing data management systems, such as parallel databases, fail to cope with such unique data properties. Big data, with this characteristics, confront computer and information technology specialists from both academic and business with a lot of challenges forcing them to develop new technologies to address and over- come these challenges. Companies and institutions benefit from the collected data through the algorithms and systems development for better data analysis and exploration. With the size of the data and the diversity of its sources, the information obtained as a result of its analysis will be more important and use- ful. In this work we propose a novel framework for big data management and analysis. The new proposed framework at the first insulation send metadata ex- tractors to all data nodes. These extractors are designed to adequate the structure of data stored at each data node. The extracted metadata is then used to clas- sify each data set instance using topic modeling algorithms. Then all topics in the data set are organized as a tree in order to facilitate mapping the related data from all different sources. When any analysis job is received, the mapping tree is used to locate the relevant data, then a copy of the analysis task is sent to data nodes which contains this data. To evaluate the performance of the proposed framework, we carried out a number of experiments where we executed several data analysis tasks to using the new proposed model. In each experiment, three criteria were used to measure the performance of the new model, namely pro- cessing time, intermediate data and data preparation time. We also performed the same experiments but using MapReduce to perform the same analysis task using the same environment. Experiments have shown an improvement in the performance of the proposed system. 
Supervisor : Dr. Fathi Issa 
Thesis Type : Doctorate Thesis 
Publishing Year : 1439 AH
2018 AD
 
Added Date : Tuesday, June 5, 2018 

Researchers

Researcher Name (Arabic)Researcher Name (English)Researcher TypeDr GradeEmail
ابراهيم محمد الحدادALHADDAD, EBRAHEEM MohammedResearcherDoctorate 

Files

File NameTypeDescription
 43479.pdf pdf 

Back To Researches Page