Statistical Data Analysis for Internet-of-Things : Scalability, Reliability, and Robustness

Abstract: Internet-of-Things is a set of sensing, communication, and computation technologies to connect physical objects, such as wearable devices, vehicles, and buildings. From those connected “Things”, a large amount of data is generated. Data analysis plays a central role in the automated and intelligent decision-making process to manage and optimize IoT systems. In this thesis, we focus on tackling the challenges of analyzing large, incomplete, and corrupt IoT data. This thesis consists of three topics. In the first topic, we study scalable GP regression for big IoT data. We propose a novel scalable GP model for urban air quality modeling and prediction. Comparing to the existing scalable GP models, the proposed scalable GP model enables tractable analysis of approximation errors. The second topic is to handle the missing data problem. In the case of missing labels in training data, we investigate different missing data mechanisms. We propose a reliable semi-supervised learning approach, which provides accurate predictive error probability. In the case of missing features in testing data, we design a robust predictor. The predictor significantly reduces the prediction error caused by rare values of missing features, while incurring only a small loss on the overall performance. The third topic is information fusion for IoT systems under false data injection attacks. We propose a robust and distributed information fusion method. This proposed information fusion method only requires exchanging the latest local posterior distributions, instead of synchronizing the full historical measurements. Furthermore, we design a false data detector based on the clustering of local posterior distributions. The distributed information fusion method and false data detector enable secure state estimation for mobile IoT networks with probabilistic communication links. Altogether, this thesis is a step to scalable, reliable, and robust IoT data analysis.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.