Optimal Subsampling Designs Under Measurement Constraints

Abstract: This thesis addresses a problem arising in large and expensive experiments where statistical analyses are restricted to a subset of a large initial dataset. This may be due to practical, economic, or computational constraints. Choosing this subset in a manner that captures as much information as possible is essential. Using finite population sampling methodology, we address the problem of optimal subsample selection. We derive optimal sampling schemes for a general class of estimators, optimality criteria, and subsampling designs. We present an adaptive subsampling framework where the information needed for the optimisation is acquired gradually during the sampling process, and study the asymptotic properties of the resulting estimators. The proposed methods are illustrated and evaluated on three problem areas: on subsample selection for optimal prediction in active machine learning (Paper I), optimal control sampling in analysis of safety critical events in naturalistic driving studies (Paper II), and optimal subsampling in a scenario generation context for virtual safety assessment of an advanced driver assistance system (Paper III). In paper IV we present a unified theory that encompasses and generalises the methods of Paper I–III, with examples on parametric density estimation, regression modelling, and finite population inference. In Paper I–III we demonstrate a sample size reduction of 10–50% with the proposed methods compared to simple random sampling and traditional importance sampling methods, for the same level of performance. We propose a novel class of invariant linear optimality criteria, which in Paper IV are shown to reach 92–99% D-efficiency with 88–96% lower computational demand.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.