Pointwise and Genomewide Significance Calculations in Gene Mapping through Nonparametric Linkage Analysis: Theory, Algorithms and Applications

University dissertation from Centre for Mathematical Sciences, Lund University

Abstract: In linkage analysis or, in a wider sense, gene mapping one searches for disease loci along a genome. This is done by observing so called marker genotypes (alleles) and phenotypes (affecteds/unaffecteds) of a pedigree set, i.e. a set of multigenerational families, in order to locate the loci corresponding to the underlying disease genes or, at least, to narrow down the interesting genome regions. In this context the key concept is the genetic inheritance of alleles with respect to the phenotype outcomes. A significant deviation from what is expected under random inheritance is taken as statistical evidence of existing genetic components suggested to be located at the loci giving significant results. In the thesis introduction we begin by outlining the needed genetical foundation of statistical genetics as well as some basic concepts, for instance, the process of allelic inheritance, the genetic disease model, the pedigree set, the inheritance vector and various types of genetic information. Next, we give an introduction to one-locus nonparametric linkage analysis focusing on significance calculations of nonparametric linkage (NPL) scores and, moreover, make some comments on the generalizations to two-locus procedures and the, related but contrasting, approach of parametric linkage analysis. In the third section we very briefly discuss some competing and complementary subfields within the context of statistical genetics and finally we put the papers included in this thesis into context by summarizing their content. Performing gene mapping-studies through whole, or substantial parts of, the genome gives rise to interpretational problems according to multiple testing. The theme of the thesis is how to calculate significance levels and powers in several contexts of such kind. In the first two papers one-locus NPL analysis, i.e. where one searches for one disease gene at a time, is considered. In Paper A existing analytical approximations of significance levels are improved and extended. The suggested formula is based on extreme-value theory for stochastic processes and a general link function between a continuous version of an arbitrary distribution function and the standard normal distribution function. In Paper B, in order to calculate significance levels, a new variant of weighted simulation for stochastic processes is developed. The method can handle complete as well as incomplete marker data and is very fast in relation to traditional methods of performing such simulations using Monte Carlo-based algorithms. The last two papers are directed towards two-locus NPL analysis, i.e. where one is interested in diseases with genetic components based on two distinct (nonsyntenic) disease genes. In Paper C significance levels and powers using unconditional two-locus analysis, i.e. where one simultaneously searches for two disease genes, are derived and discussed for homogeneous pedigree sets based on units of affected sib-pairs. Finally, in Paper D, a general approach for calculation of significance levels and powers in conditional two-locus analysis is developed. The conditional approach might be seen as a hybrid of one-locus and two-locus NPL analysis. Of central importance to this paper is the concept of noncentrality parameters, which basically is the expected value of the test statistic of interest, i.e. the NPL score, under a corresponding instance of the alternative hypotheses.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.