On diverse biophysical aspects of genetics from the action of regulators to the characterization of transcripts

University dissertation from Stockholm : KTH Royal Institute of Technology

Abstract: Genetics is among the most rewarding fields of biology for the theoretically inclined, offering both room and need for modeling approaches in the light of an abundance of experimental data of different kinds. Many aspects of the field are today understood in terms of physical and chemical models, joined by information theoretical descriptions. This thesis discusses different mechanisms and phenomena related to genetics, employing tools from statistical physics along with experimental biomolecular methods. Five articles support this work.Two articles deal with interactions between proteins and DNA. The first one reports on the properties of non-specific binding of transcription factors proteins in the yeast Saccharomyces cerevisiae, due to an effective background free energy which describes the affinity of a single protein for random locations on DNA. We argue that a background pool of non-specific binding sites is filled up before specific binding sites can be occupied with high probability, thus presenting a natural filter for genetic responses to spurious transcription factor productions. The second article describes an algorithm for the inference of transcription factor binding sites for proteins using a realistic physical model. The functionality of the method is verified on a set of known binding sequences for Escherichia coli transcription factors.The third article describes a possible genetic feedback mechanism between human cells and the ubiquitous Epstein-Barr virus (EBV). 40 binding regions for the major EBV transcription factor EBNA1 are identified in human DNA. Several of these are located nearby genes of particular relevance in the context of EBV infection and the most interesting ones are discussed.The fourth article describes results obtained from a positional autocorrelation analysis of the human genome, a simple technique to visualize and classify sequence repeats, constituting large parts of eukaryotic genomes. Applying this analysis to genome sequences in which previously known repeats have been removed gives rise to signals corroborating the existence of yet unclassified repeats of surprisingly long periods.The fifth article combines computational predictions with a novel molecular biological method based on the rapid amplification of cDNA ends (RACE), coined 5’tagRACE. The first search for non-coding RNAs encoded in the genome of the opportunistic bacterium Enterococcus faecalis is performed here. Applying 5’tagRACE allows us to discover and map 29 novel ncRNAs, 10 putative novelm RNAs and 16 antisense transcriptional organizations.Further studies, which are not included as articles, on the monitoring of secondary structure formation of nucleic acids during thermal renaturation and the inference of genetic couplings of various kinds from massive gene expression data and computational predictions, are outlined in the central chapters.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)