Statistical Modeling and Learning of the Environmental and Genetic Drivers of Variation in Human Immunity

University dissertation from Department of Automatic Control, Lund Institute of Technology, Lund University

Abstract: During the last decade the variation in the human genome has been mapped in fine detail. Next generation sequencing has made it possible to cheaply and rapidly aquire vast amounts of biomolecular information on large cohorts of people. This have enabled large-scale epidemiological studies to investigate the relationships between environmental and genetic factors and human biomolecular traits. It is now possible to map variation in the genomic blueprint for human biology to variation in levels of epigenomic marks, gene expression levels and protein expression levels. This development has opened up the possibility of a "phenomic science": the data-driven study of the interactions between all levels of the relationship between the genotype, the environment, and the phenotype. The Milieu Intérieur study of Institut Pasteur, Paris, aims at bringing the techno-logical developments of modern biology to bear on the study of the human immune system in homeostasis. Deep phenotyping has been performed on 1,000 healthy, un-related people of Western European ancestry. The cohort is evenly stratified across sex, and across five decades of life, between 20 and 70 years of age. In this thesis, we combine the standardised flow cytometry of 173 parameters of innate and adaptive immune cells, genome-wide DNA genotyping, detailed information on life-style and environmental factors and MethylationEPIC array data of the Milieu Intérieur cohort, to identify the genetic and environmental drivers of variation in the human immune system. The increasing complexity of biological data requires the development of new statistical tools. In this work, we aim to integrate developments in machine learning, convex optimization, causal inference, and statistical methodology, to build robust and reliable tools for analysing the high-dimensional and highly complex biomolecular data of the Milieu Intérieur study. We construct a pipeline to perform genome-wide association studies on phenotypes with heterogenous distributions, while controlling for arbitrarily many environmental factors. The pipeline is applied to study the genetics of human immune system variation in homeostasis and the genetics of the function of the human thymus. Our pipeline identifies 15 loci that influence immunophenotypes. We show that these loci are enriched in disease-associated variants. We also report a commongenetic variant, situated within the T cell receptor locus, that increases the production of naive T cells within the human thymus. In addition, we find four key non-genetic factors that drive variation in the healthy human immune system: age, sex, latent cytomegalovirus infection and smoking. Age, sex, and smoking have a broad impact on the innate and the adaptive immune subsystems, while cytomegalovirus infection primarily seems to skew the T cell compartment of the adaptive immune subsystem towards inflammatory subsets. We also show that age and sex influence the function of the human thymus. Immunophenotypes are intimately connected to epigenetic markers in whole-blood. We leverage the >850,000 methylation sites probed in the MethylationEPIC array to build high-dimensional predictive models of 70 immune cell subsets and other traits such as age and smoking status. We employ elastic net regression and stability selection to build sparse, regularized models, and show that they are capable of estimating blood cell composition more accurately and cost-effectively than previous methods. The properties of elastic net regression and stability selection also enable us to investigate the relationship between DNA methylation and immune blood cell composition. This thesis develops methods for, and performs, the analysis of parts of the rich and multifaceted data of the Milieu Intérieur study. With the construction and analysis of this rich observational data we contribute to the young fields of population immunology and human phenomic science. We discover novel associations that will help in understanding the differences between people in vaccination efficacy and susceptibility to common autoimmune and infectious disesases. Finally, we present predictive models that will facilitate the application of immunological markers in the clinics

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)