Data-driven Control and Planning for Uncertain Complex Systems
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Reinforcement learning (RL) has emerged as a front-runner in the race to imbue machines with artificial intelligence, driven by advances in parallel computing hardware, deep neural network architectures, and large data sets. Simultaneously, advances in actuator and sensor technology have brought about platforms with enormous potential to automate menial, dangerous, and expensive tasks such as autonomous driving. As RL leaps from virtual to physical environments, a new challenge appears: safety becomes far more critical. Despite uncertainty about the effects of decisions, which is always present due to noise, errors, and changing environments, control policies must nonetheless direct the evolution of the system towards the goal and away from danger. The fundamental goal of this dissertation is to realize the potential of RL in modern complex systems, contending with challenges posed by dynamics and feedback in the face of uncertainty, by establishing performance and safety guarantees. Despite rapid recent progress, there is still a large gap between theory and practice for optimal control of even relatively simple dynamical systems affected by randomness whose dynamics and statistics are unknown. As a steppingstone towards analysis of more complicated systems, this dissertation focuses on linear systems with multiplicative noise, a stochastic system representation with several practical applications, and linear quadratic control tasks for such systems, useful baselines for which optimal policies can be efficiently computed. This dissertation begins by establishing fundamental connections between stochastic stability and robust stability, formally demonstrating the utility of multiplicative noise as a tool for inducing robustness to parametric uncertainty in dynamic models. Building on this motivating result, RL algorithms across the model-based / model-free continuum that learn optimal policies only from observed data, without explicit access to dynamics or noise statistics governing the system, are developed and analyzed. First, modelfree policy optimization methods, which directly tune policy parameters, are proved to converge at a linear rate with quantified sample complexity polynomial in problemdependent quantities, and can, with appropriate regularization, achieve actuator and sensor sparsity for economical control of networks without sacrificing stability or performance. Second, approximate dynamic programming methods, which have mixed modelfree/based character and learn intermediate value functions, are shown to achieve a fast cubic rate of convergence with a novel midpoint formulation, and obtain policies with enhanced robustness properties when applied to stochastic dynamic games. Third, modelbased system identification methods, which estimate a dynamic model, are proved to learn the dynamics and noise statistics of multiplicative noise systems with quantified error that scales inversely with the square root of the amount of observed data, and are incorporated into adaptive control schemes that obtain robustness against actual uncertainty accrued from model estimation errors. To tractably generate safe trajectories, which are tracked by these data-driven policies, motion planning techniques are developed that actively avoid collisions and account for disturbance profiles with unknown distributions via distributionally robust checks. Throughout, examples are provided to clarify concepts, simulations and experiments are performed to validate theoretical results, and computer code is made available to promote reproducibility and future discoveries.