Bioinformatic studies of genetic variation at known, novel and candidate blood group loci

Abstract: Access to compatible blood for transfusion is a prerequisite for modern health care. The compatibility is limited by the presence of antibodies to blood group antigens, polymorphic protein and carbohydrate structures, on the surface of the red blood cell. Blood group antigens arise from genetic variation in the genes underlying their expression. Knowledge of these genes and their variation can facilitate the provision of compatible blood.The overall aim of the thesis, comprising four papers, was to study the genetic variation at loci underlying human blood group systems and antigens, using bioinformatic methods. In Paper I, the genetic background of the Vel– blood group phenotype was elucidated. In Paper II, the genetic variants regulating the variable expression of the Vel blood group antigen was studied. In Paper III, whole genome sequencing (WGS) data from the 1000 Genomes project were used to create a database of all alleles in known blood group-related genes and to predict the presence of novel blood group antigens. Finally, in Paper IV, human glycosyltransferase genes expressed in erythroid tissue were identified and the potential for candidate carbohydrate-based blood group systems was predicted.Using SNP array data from Vel-phenotyped blood donors, including members of two families, a 17-base-pair deletion in the previously uncharacterized but evolutionary conserved gene SMIM1 was found to cause the Vel− bloodgroup phenotype. In Vel+ blood donors from different populations, two polymorphisms in intron 1 of SMIM1, rs1175550, and, to a lesser extent, rs143702418, were found to affect the expression of the Vel blood group antigen. In WGS data from the 1000 Genomes project, a large number of previously unreported blood group gene-related alleles were found and compiled into a database, Erythrogene. Among all identified genetic variants, 357 were non-synonymous and predicted to occur on the extracellular portion of blood group-carrying proteins and may represent novel or modified blood group antigens. In the human genome, 244 expressed glycosyltransferase genes were identified, 30 of which were predicted to have properties similar to known genes in carbohydrate-based blood group systems.The use of bioinformatic methods in the search of genetic variation underlying blood group systems and antigens was successful. The benefits of utilizing publicly available genotyping data in studies of blood groups are highlighted.