Scalable and Secure Learning with Limited Supervision over Data Streams

dc.contributor.ORCID0000-0001-5366-8694 (Chandra, SR)
dc.contributor.advisorKhan, Latifur
dc.contributor.advisorLin, Zhiqiang
dc.creatorChandra, Swarup Rama
dc.date.accessioned2018-10-17T01:26:45Z
dc.date.available2018-10-17T01:26:45Z
dc.date.created2018-08
dc.date.issued2018-07-17
dc.date.submittedAugust 2018
dc.date.updated2018-10-17T01:26:46Z
dc.description.abstractApplications that employ machine learning over a stream of data provide the knowledge necessary for its users to make informed decisions at the right time. With the advantages of cloud computing infrastructure, these applications can potentially reach a myriad of users. However, challenges arising from evolving statistical properties of data occurring continuously over time, and concerns about data security has severely limited its adoption in the real world. This dissertation contributes new results to address critical challenges in both these complementary research areas, particularly when deployed over a third-party resource. The first part of this dissertation introduces a novel framework for data classification over a non-stationary data stream where the goal is to learn from limited labeled data over time. Here, a scenario in which multiple data generating processes, that continuously generate data, is considered, with a constraint of labeled data being generated only by a small set of processes whose data distribution is biased compared to the population. The effect of learning with such sampling bias in a concept-drifting data stream is explored. Changes in data distribution over time with biased labeled data degrades classifier performance. By representing instances along the stream as two independent streams, one containing labeled instances (called the source stream) and the other containing unlabeled instances (called the target stream), methodologies which uniquely combine transfer learning mechanisms with drift detection are presented. While the above framework may adapt existing batch-wise bias correction techniques, these are computationally expensive and are not scalable over a data stream. The next part of this dissertation explores sampling and ensemble techniques to address this challenge. The theoretical and empirical results show large improvements in computational time while maintaining similar performance compared to the baseline methods. The final part of this dissertation considers security concerns when deploying applications that use machine learning systems on an untrusted third-party resource. Here, the focus is on protecting the learning system against insider threats. A strong adversary can evade security and privacy of an application aiming to protect its code and data. Using the recent commercially available off-the-shelf (COTS) hardware-based cryptographic platform, called Intel SGX, a black-box system can be achieved to protect against such direct attacks. Unfortunately, side-channels from the platform that leak information during computation exists. A novel defense strategy that leverages the trade-off between computational efficiency and privacy to address this challenge is presented, with results demonstrating a large gain in computational time compared to other competing strategies.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10735.1/6196
dc.language.isoen
dc.rights©2018 The Author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subjectSupervised learning (Machine learning)
dc.subjectComputer multitasking
dc.subjectStreaming technology (Telecommunications)
dc.subjectData mining
dc.subjectData encryption (Computer science)
dc.titleScalable and Secure Learning with Limited Supervision over Data Streams
dc.typeDissertation
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.levelDoctoral
thesis.degree.namePHD

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ETD-5608-011-CHANDRA-8457.95.pdf
Size:
1.02 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description: