High Confidence Network Predictions from Big Biological Data

Abstract: Biology functions in a most intriguing fashion, with human cells being regulated by multiplex networks of proteins and their dependent systems that control everything from proliferation to cell death. Notably, there are cases when these networks fail to function properly. In some diseases there are multiple small perturbations that push the otherwise healthy cells into a state of malfunction. These maladies are referred to as complex diseases, and include common disorders such as allergy, diabetes type II, and multiple sclerosis, and due to their complexity there is no universally defined approach to fully understand their pathogenesis or pathophysiology. While these perturbations can be measured using high-throughput technologies, the interplay of these perturbations is generally to complex to understand without any structured mathematical analysis. There is today numerous such methods that put the small perturbations of complex diseases into relation of interactions among each other. However, the methods have historically struggled with notable uncertainty in their predictions.This uncertainty can be addressed by at least two different approaches. First, mechanistically realistic mathematical modelling is an approach that has the capacity to accurately describe almost any biological system, but such models can to-date only describe small systems and networks. Secondly, large-scale mathematical modelling approaches exist, but the faithfulness of the models to the underlying biology has been compromised to achieve algorithms that are computationally effective.In this Ph.D. thesis, I suggest how high confidence predictions of network interactions can be extracted from big biological. First, I show how large-scale data can be used when building high-quality ODE models (Paper I). Secondly, by developing the software LASSIM, I show how ODE models can be expanded to the size of entire cell systems (Paper II). However, while LASSIM showed that powerful non-linear ODE-modelling can be applied to understand big biological data, it still remained a machine learning-based approach in contrast to hypothesis-driven model development.Instead, two more studies revolving around large-scale modelling approaches were initiated. The third study suggested that ambiguities in model selection and interaction identification greatly compromise the accuracy of available tools, and that the novel software of Paper III, LiPLike, can be used to remove such predictions. Intriguingly, while LiPLike was able to effectively discard false identifications, the accuracy of predictions remained relatively low. This low accuracy was thought to arise from model simplifications, and therefore the next study aimed at finding methods that come closer to the true biological system (Paper IV). In particular, the study aimed at predicting protein abundance -the true mediators of biological functionality- from the much more easily accessible mRNA levels, and found that such models could be used to get several new insights on protein mechanisms, which was exemplified by the identification of important biomarkers of autoimmune diseases.The analysis of big biological data and the underlying networks is a centrepiece of understanding both diseases and how cell functionality is orchestrated. The work that is presented in this Ph.D. thesis represents a journey between fields with different views on how these networks should be inferred. In particular, it aimed to combine the accuracy of small-scale mechanistic modelling with the system-spanning potential of large-scale linear system modelling, and this thesis thus provides a tool-bench of methods and insights on how knowledge can be extracted from big biological data, and in extension it is a small step towards a generation of new comprehensions of biological systems and complex diseases.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.