Observing the darkest matter of the genome : Expression of human endogenous retrovirus W elements

University dissertation from Stockholm : Karolinska Institutet, Department of Neuroscience

Abstract: The human genome is composed of coding genes and vast stretches of sequences largely considered junk . Researchers are, however, uncovering widespread and extensive transcription of not only the coding, but also of the non-coding sequences in the genomes of many species. Transcripts that do not code for any protein are thought to carry out their potential functions by directly interacting with other sequences and proteins by their base-pairing capabilities or secondary structures. Since little is known about non-coding DNA and their RNA transcripts, they have been called the dark matter of the genome. Half the human genome is composed of repetitive sequences, about eight percent by ancient remnants of retroviral infections called human endogenous retroviruses (HERV). These repetitive elements are usually excluded from most studies of expressed sequences as they are methodologically problematic to identify unambiguously. The dogma has been that degenerated viral sequences are junk and are for the most part transcriptionally silent. This is being revised because of observation of transcription of these elements in human tissues and expression variations associated to human diseases. These repetitive regions could be called the darkest matter of the genome. In this thesis are included observations of expression patterns of HERV elements and increased expression and alterations associated to exogenous virus infections. An evaluation of the currently available sequence specific assays and a novel melting temperature (Tm) analysis method for studying expression patterns of highly repetitive and homologous sequences is presented herein. The Tm analysis method was further developed with: i) the use of a temperature probe to normalize for temperature deviations in the thermocycler instrument, ii) a curve fit algorithm to interpolate exact temperatures from multiple data points and iii) a new approach to analyzing obtained Tm with mixture models for an impartial and objective statistical analysis. Using these methods, we studied the expression patterns of individual elements within one HERV family in human tissues. We found significant differences between expression patterns of HERV between human tissues and between individuals to an extent similar to that which would be expected for coding transcripts. The observations and methods developed in the course of this thesis might hopefully help in casting some light on the expression, regulation and functions of these RNAs containing highly repetitive sequences.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.