Learning from Complex Medical Data Sources

Abstract: Large, varied, and time-evolving data sources can be observed across many domains and present a unique challenge for classification problems, in which traditional machine learning approaches must be adapted to accommodate for the complex nature of such data. Across most domains, there is also a need for machine learning models that are both well-performing and interpretable, to help provide explanations of a model's decisions that stakeholders can trust and take appropriate actions with. In the medical domain, complex Electronic Health Record (EHR) data consists of longitudinal records of patient histories spanning structured and unstructured data types. Exploiting such complex medical data is vital as a means of gaining useful medical insights and predictions, and where establishing stakeholder trust through useful explanations is critical. This thesis has focused on producing state-of-the-art classification methods for exploiting the heterogeneity and temporality of complex data; secondly, on developing novel interpretability methods to aid in the understanding of model predictions from such complex data; and finally on ensuring the medical applicability of the developed methods and other novel methods particularly for the medical problem of adverse drug event (ADE) prediction.In the first part of this thesis, several state-of-the-art classification frameworks for exploiting complex medical data are outlined, with their utility demonstrated through comparative empirical evaluations to competing framework approaches. In the second part of this thesis, novel interpretability methods are developed and demonstrated for their applicability across domains. In the third part of this thesis, the applicability of interpretability and explanability methods for complex medical data are investigated, refined, and assessed for validity in connection to the use-case of ADE prediction. Main contributions of this thesis include: two novel classification frameworks, including SMILE, demonstrating significantly improved AUC performance over the main framework competitors and other selected competitor approaches; novel generalised ‘time-series tweaking’ methods delivering optimized counter-factual explanations in the time series domain; and findings that attention-based explanations from interpretable deep learning models and the post-hoc SHAP techniques can be leveraged for medical insight and explanations for ADE predictions.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)