Omics Data Analysis of Complex Diseases and Traits

Abstract: Following the advent of the high-throughput techniques for producing massive omics data, new possibilities and challenges have also emerged in different fields of biology and medicine. Dealing with such data on different scales with different scopes such as genomics, transcriptomics, proteomics and metabolomics, demands appropriate data collection, preprocessing, statistical analysis, interpretation and visualization. The overall goal of this thesis was to conceive omics-related questions in the context of four research titles and to apply a rational choice of the mentioned methods to conduct the study plans to answer them. Paper I asks whether we could propose potentially implicated genes in psoriasis; and tries to answer it using microarray transcriptomics data of psoriasis. Initially, quality control was performed on the microarray dataset and then the Differentially Expressed Genes (DEGs) were chosen for mapping to a protein-protein interaction (PPI) database to create a subnetwork of the respective PPI. Using network analysis, genes with higher scores were proposed as potentially relevant to psoriasis and finally, we evaluated the results concerning a gene-disease association database. Paper II asks whether the knockout of two genes followed by a transformation in E. coli could lead to an increase in bacterial growth in two different media; and deals with it through in vitro experiments followed by an in silico analysis of E. coli RNA-seq data. Here, we calculated the pairwise correlations between each target (knockout) gene and the rest of the genes in the RNA-seq dataset. Then, the significantly anti-correlated genes were shown to mainly belong to protein biosynthesis pathways compared to all other background pathways, which might indicate an increase in protein biosynthesis-related genes' transcription levels when there is an absolute decrease (knockout) in each of the target genes. Paper III asks if an anti-bone-resorption drug called Denosumab significantly affects the abundance of the metabolites extracted from blood samples during a two-year longitudinal placebo-controlled clinical trial study; and tries to address this through running statistical hypothesis testing for each metabolite in the quantification data from Liquid Chromatography-Mass Spectrometry (LC-MS). Afterwards, the patterns of metabolites' variations concerning Denosumab administration and visit times were studied using Principal Component Analysis (PCA), association studies and Hierarchical clustering. The results of this study proposed some identified metabolites for further clinical investigations. Based on our analyses, the patterns of abundance variations in some of the identified metabolites could be considered for improving the corresponding clinical studies and treatment with Denosumab. Paper IV proposes potentially relevant genes in lung adenocarcinoma by constructing a genome-scale co-expression network followed by clustering. The genes in each cluster were studied using the literature knowledge. One of the most frequently reported genes in lung adenocarcinoma was EGFR. We reported all the first-neighborhood genes connected to EFGR in its corresponding module as potentially relevant to lung adenocarcinoma. The repertoire of the above choices, workflows and evaluations could be applicable for further follow-up studies at different levels including omics data integration, personalized omics data analysis, studies on different scales such as cellular or tissue, using other methodologies for the same questions and running benchmarks. Although four different omics-related questions were posed in this thesis, they all involved the selection or preparation of the respective omics data, choosing preprocessing strategies, choosing statistical analyses and hypothesis testing methods and finally, performing the evaluation of the results and interpretations.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)