Data integration for robust network-based disease gene prediction

University dissertation from Stockholm : Department of Biochemistry and Biophysics, Stockholm University

Abstract: For many complex diseases the cause/mechanism can be tied not to a single gene and in order to cope with the complexity a systems wide approach is needed. By combining evidence indicative of functional association it is possible to infer networks of protein functional coupling. The reliability of these networks is dependent on having sufficient data and on the data being informative.By combining evidence from multiple species, functional coupling networks can reach higher coverage and accuracy. Genes in different species derived from the same gene by a speciation event are orthologous and likely to have a conserved function. In order to enable the transfer of information across species we inferred orthology with the InParanoid algorithm and made the inferences available to the public in the associated database.Identification of genes involved in diseases is an important biomedical goal. Based on the "guilt by association" principle, we implemented an approach, Maxlink, for identifying and prioritizing novel disease genes. By searching the FunCoup network for genes functionally coupled to cancer genes we identified some 1800 novel cancer gene candidates showing characteristics of cancer genes.While proteins are the active components, mRNA is often used as a proxy due to the difficulty of measuring protein abundance. We examined the relationship between mRNA and protein, using properties of expression profiles to identify subsets of genes with higher mRNA-protein concordance.If technical and biological differences between patient/control studies of gene expression have a large impact, the results of studies of the same disease might be inconsistent. To determine this impact we examined the consistency in differential (co)expression between different studies of cancer, as well as non-cancer studies. Such consistency could generally be found, even between studies of different diseases, but only when common pitfalls of gene expression analysis are avoided.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)