Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical Systems

Abstract: Connected devices on the edge of the Edge-to-Cloud (E2C) continuum are producing increasing amounts of data that hold the key to unlocking valuable use cases among a wide range of applications. In the vehicular domain, connected vehicles in large fleets (called Vehicular Cyber-Physical Systems or VCPSs) sense and collect terabytes of data such as time series and video, enabling everything from predictive maintenance to autonomous drive. For VCPSs the computing devices located onboard vehicles are not dimensioned to process all the data produced onboard. Simultaneously, communication to the cloud, where computing resources are more readily available, relies on bandwidth-limited and costly carrier-operated cellular connectivity. As transmitting all raw data to the cloud for analysis incurs increasing costs and processing latencies, and the edge devices lack the capability to perform all required data analyses, the questions of "Where" and "How" to process "Which" data become paramount and form the foundation of this thesis. The first part of this thesis gives an outline of my work by introducing relevant background topics, motivating the research questions and describing the contributions of this thesis. These contributions are then contained in the five chapters that make up the second part: in Chapter A, I present the DRIVEN framework consisting of a novel lossy online time-series compression algorithm with tuneable bounded error for the edge, as part of a pipeline from edge to cloud that includes online data clustering, and evaluate the tradeoffs between data savings and reduced analysis accuracy from lossy compression. In Chapter B, I show how our work on Data Localization helps in discovering those vehicles in a connected fleet that have data relevant to a user-defined analysis task quickly and efficiently. Chapter C proposes Ananke, the first forward provenance framework for Stream Processing, enabling a route for selecting relevant data inside streaming sources that are ubiquitous in VCPSs. In Chapter D, I present the Nona framework that solves the problem of forward provenance for evolving sets of Stream Processing queries and thus allows data selection for modern analysis flows in which queries are constantly altered and redeployed. Finally, in Chapter E, I introduce a comprehensive requirements list for and an implementation of a VCPS learning simulator that enables the efficient evaluation of distributed data analysis algorithms for connected vehicular networks. This thesis makes significant steps forward for utilizing edge resources more efficiently, while also setting the basis for further development of novel distributed data analysis algorithms in VCPSs.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.