Learning local predictive accuracy for expert evaluation and forecast combination

Abstract: This thesis consists of four papers that study several topics related to expert evaluation and aggregation. Paper I explores the properties of Bayes factors. Bayes factors, which are used for Bayesian hypothesis testing as well as to aggregate models using Bayesian model averaging, are sometimes observed to behave erratically. We analyze some of the sources of this erratic behavior, which we call overconfidence, by deriving the sampling distribution of Bayes factors for a class of linear model. We show that overconfidence is most likely to occur when comparing models that are complex and approximate the data-generating process in widely different ways.  Paper II proposes a general framework for creating linear aggregate density forecasts based on local predictive ability, where we define local predictive ability to be the conditional expected log  predictive density given an arbitrary set of pooling variables. We call the space spanned by the variables in this set the pooling space and propose the caliper method as a way to estimate  local predictive ability. We further introduce a local version of linear optimal pools that works by optimizing the historic performance of a linear pool only for past observations that were made at  points in the pooling space close to the new point at which we want to make a prediction. Both methods are illustrated in two applications: macroeconomic forecasting predictions of bike sharing usage in Washington D.C.Paper III builds on Paper II by introducing a Gaussian process (GP) as a model for estimating local predictive ability. When the predictive distribution of an expert, as well as the data-generating process, is normal, it follows that the distribution of the log scores will follow a scaled and translated noncentral chi-squared distribution with one degree of freedom. We show that,  following a power-transform of the log scores, they can be modeled using a Gaussian process  with Gaussian noise. The proposed model has the advantage that the latent Gaussian process surface can be marginalized out in order to quickly obtain the marginal posteriors of the hyperparameters of the GP, which is important since the computational cost of the unmarginalized model is often prohibitive. The paper demonstrates the GP approach to modeling local predictive ability with a simulation study and an application using the bike sharing data from Paper II, and develops new methods for pooling predictive distributions conditional on full posterior distributions of local predictive ability.  Paper IV further expands on Paper III by considering the problem of estimating local predictive ability for a set of experts jointly using a multi-output Gaussian process. In Paper III, the posterior distribution of the local predictive ability of each expert is obtained separately. By instead estimating a joint posterior, we can exploit dependencies in the correlation between the predictive ability of the experts to create better aggregate predictions. We can also use this joint posterior for inference, for example to learn about the relationships between the different experts. The method is illustrated using a simulation study and the same bike sharing data as in Paper III.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)