Survey estimation for highly skewed data

University dissertation from Stockholm : Stockholm University

Abstract: Estimation of the population total of a highly skewed survey variable from a small sample is problematic if straightforward methods are used since (i) when there are no extreme values in the sample, too small estimates will be obtained (ii) if extreme values are sampled, the estimates will become grotesquely large. Traditional methods for outlier treatment will usually compensate for outliers in the sample, thereby avoiding (ii), whereas the small negative bias of (i) will persist. Here, a lognormal superpopulation model is proposed. A particular strength of the lognormal model estimator is that even in the absence of extremely large values in the sample, the assumed lognormal structure of the survey variable is used for estimating the population total.Two estimators based on a lognormal superpopulation distribution are proposed: (i) one estimator applicable if the shape parameter of the assumed lognormal superpopulation distribution is known (ii) one estimator applicable if the shape parameter is unknown. For both estimators, any number of auxiliary variables can be utilized. Estimator (i) is of little practical importance, but has the advantage that it is model unbiased, and that a model unbiased estimator of its estimation error variance also easily can be derived. Estimator (ii), although only approximately model unbiased, is more practically applicable, because of the more realistic assumption of unknown shape parameter.Both estimators (i) and (ii) are applicable only for variables that are strictly positive. A third estimator, based on a combined lognormal-logistic superpopulation model is therefore proposed; this estimator can be applied to situations in which the survey variable, while highly skewed, may assume the value zero for a number of units.The three model-based estimators are compared to a number of alternative estimators (design-based estimators as well as estimators specifically constructed for outlier treatment) in a simulation study, using random populations as well as real survey populations. The simulation results give at hand that the model-based estimators constitute a sensible alternative to the alternative estimators, in particular when the sample size is small and when the distribution of the survey variable is close to the assumed superpopulation distribution.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.