Design and Development of Real-Time Big Data Analytics Frameworks

DSpace/Manakin Repository

Design and Development of Real-Time Big Data Analytics Frameworks

Show full item record

Title: Design and Development of Real-Time Big Data Analytics Frameworks
Author(s):
Solaimani, M.;
0000-0001-7049-581X
Advisor: Khan, Latifur
Date Created: 2017-12
Format: Dissertation
Keywords: Show Keywords
Abstract: Today most sophisticated technologies such as Internet of Things (IoT), autonomous driving, Cloud, data center consolidation, etc., demand smarter IT infrastructure and real-time operations. They continuously generate lots of data called “Big Data” to report their operational activities. In response to this, we need advanced analytics frameworks to capture, filter, and analyze data and make quick decisions in real-time. The high volumes, velocities, and varieties of data make it an impossible (overwhelming) task for humans in real-time. Current state-of-the-arts like advanced analytics, Machine learning (ML), Natural Language Processing (NLP) can be utilized to handle heterogeneous Big Data. However, most of these algorithms suffer scalability issues and cannot manage real-time constraints. In this dissertation, we have focused on two areas: anomaly detection on structured VMware performance data (e.g., CPU/Memory usage metric, etc.) and text mining for politics in unstructured text data. We have developed real-time distributed frameworks with ML and NLP techniques. With regard to anomaly detection, we have implemented an adaptive clustering technique to identify individual anomalies and a Chi-square-based statistical technique to detect group anomalies in real-time. With regards to text mining, we have developed a real-time framework SPEC to capture online news articles of different languages from the web and annotated them using CoreNLP, PETRARCH, and CAMEO dictionary to generate structured political events like ‘who-did-what-to-whom’ format. Later, we extend this framework to code atrocity events – a machine coded structured data containing perpetrators, action, victims, etc. Finally, we have developed a novel, distributed, window-based political actor recommendation framework to discover and recommend new political actors with their possible roles. We have implemented scalable distributed streaming frameworks with a message broker – Kafka, unsupervised and supervised machine learning techniques and Spark.
Degree Name: PHD
Degree Level: Doctoral
Persistent Link: http://hdl.handle.net/10735.1/5684
Terms of Use: Copyright ©2017 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Type : text
Degree Program: Computer Science

Files in this item

Files Size Format View
ETD-5608-7474.73.pdf 1.934Mb PDF View/Open

This item appears in the following Collection(s)


Show full item record