IncRDD: Incremental Updates for RDD in Apache Spark

DSpace/Manakin Repository

IncRDD: Incremental Updates for RDD in Apache Spark

Show full item record

Title: IncRDD: Incremental Updates for RDD in Apache Spark
Author(s):
Dodabelle Prakash, Prathish;
0000-0003-3516-8316
Advisor: Wu, Weili
Date Created: 2017-05
Format: Thesis
Keywords: Show Keywords
Abstract: Data is constantly changing. Today, there can be incremental updates to the existing data. As the data is evolving with new updates, the results of big data applications gradually become out of date and stale. It is required to refresh the results for every update efficiently. Apache Spark is used to process multiple petabytes of data on clusters having thousands of nodes. The core abstraction of Spark is RDD (Resilient Distributed Dataset), which is an immutable collection of elements. Due to the immutability of RDD, Spark works information in parallel, permits information reuse, and handles failures and stragglers productively. But Spark lacks flexibility and efficiency of incremental processing of small updates. In this thesis, IncRDD framework is proposed for incremental processing of updates to the existing data. IncRDD sustains all the powerful features of Spark including parallel processing, data reusability, and fault tolerance. New operations for RDD are implemented to add new records, update the existing records, and delete them. We introduce a new variant of Cuckoo hashing, Dual-CH Fast-Simple. Dual Cuckoo hashing uses two cuckoo hash tables. The first cuckoo table is used to store records, in every partition of a node. The second hash table is used to implement structural sharing, which adds persistence, utilize previous versions, and avoids expensive re-computation. We evaluate IncRDD using incremental algorithms and provide experimental results to show the significant improvement in the performance of Incremental RDD.
Degree Name: MSCS
Degree Level: Masters
Persistent Link: http://hdl.handle.net/10735.1/5417
Type : text
Degree Program: Computer Science

Files in this item

Files Size Format View
DODABELLEPRAKASH-THESIS-2017.pdf 925.8Kb PDF View/Open

This item appears in the following Collection(s)


Show full item record