Analyses of genomic and gene expression signatures

University dissertation from Stockholm : Karolinska Institutet, Microbiology and Tumor Biology Center (MTC)

Abstract: Biology has entered a challenging, information-intense period where computational experiments are complementing traditional experiments. A plethora of new techniques have allowed biological processes to be investigated on a global scale. The data analysis has become non-trivial, but crucial in order to draw the appropriate conclusions from these experiments. This thesis combines molecular biology techniques, with a focus on computational techniques, to investigate gene expression profiles and genome signatures. The technological breakthrough with high-density oligonucleotide arrays and cDNA microarrays has enabled the parallel monitoring of the expression levels of thousands of mRNA transcripts. Using high-density oligonucleotide arrays of mRNA from different regions of the adult mouse brain, we identified both region-specific- and strain-specific (129SvEv and C57bl/6) gene expression differences. These genes with strain-specific differential expression are candidates to be involved in the behavioural differences between 129SvEv and C57bl/6. In two reference gene expression studies, we primarily compared the gene expression profiles of cell lines with their corresponding normal and tumor tissues. We estimated the degree of differential expression between cell lines and tissues, and the expression of tissue-specific genes in cell lines. Secondly, we also developed a method to measure tumor and tissue characteristic gene expression in individual cell lines. In pharmaceutical screening programs and in experimental research, when cell lines are used as model systems, the proposed Tissue Similarity Index can be an important tool in the selection of the most appropriate cell lines. Each prokaryote genome has a species-specific bias in the occurrence of short nucleotide motifs, known as its genomic signatures. We demonstrated that genomic signatures where detectable in short DNA sequences and designed a naive Bayesian classifier that identified the correct species origin of DNA sequences based on the genomic signature representation. The classification of DNA sequences was applied to the identification of horizontal gene transfer events. Further, the species-specificity of other sequence biases, such as codon bias, G+C content, and amino acid bias in relation to the genomic signatures were quantified.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.