Defining human adaptive immune gene diversity

Abstract: The lymphocytes (B and T cells) of the adaptive immune system undergo variable (V), diversity (D), and junctional (J) gene recombination, resulting in highly diverse antigen receptor repertoires capable of recognizing a wide variety of invading pathogens. The B cell receptor (BCR) contains two identical heavy and light chains. While the T cell receptor (TCR) can either be composed of an alpha-beta (αβ) or a gamma-delta (γδ) chains, utilized by conventional T cells and γδT cells respectively. The chromosomal regions that encode the BCR and TCR loci are highly polymorphic and include both allelic variation and structural variation (deletions and duplications). The diversity within the adaptive immune receptor loci has remained insufficiently explored at the individual and population-level. At the outset of this project, the main currently used database for BCR and TCR germline variation, IMGT (The international immunogentics information system), is composed almost exclusively of alleles from European individuals, and it contains many alleles that have not been independently confirmed. To improve our understanding of BCR and TCR germline gene variation, and to generate improved and more accurate databases representing this variation, we developed laboratory techniques and software programs that allow personalized genotyping of these genes such as IgDiscover, corecount and haplotype analysis, as well as the high throughput technique, ImmuneDiscover, used in this thesis. In paper I, we used an expression-based germline gene inference approach to characterize 45 human volunteers belonging to four different populations: African, European, East, and South Asian. Using the IgDiscover tool we identified 175 novel V and J alleles and we found that a substantial number of these alleles contained coding-changes and several alleles were population restricted. Additionally, we identified three introgressed regions from Neanderthals that were present in presentday Europeans and Asians. Through functional studies of recombinant TCRs engineered to use different TRGV4 alleles, we showed that a highly diverse archaic TRGV4 allele mediated reduced binding to the BTNL3/BTNL8 ligand (butyrophilin-like molecule), which has implications for intraepithelial γδT cells that are known to use TRGV4, critically showing that allelic variation can result in functional differences. In Paper II, we introduce a novel genotyping tool, developed as a part of IgDiscover software known as corecount, for identifying the allelic variants within the BCR and TCR loci. We applied this technique to the 16 IgM libraries and the accuracy of this tool was verified through targeted genomic PCR and Sanger sequencing of all the 65 V alleles, 27 D alleles and seven J alleles present in one European donor. The genomic validation process determined the full length of five known V alleles and two known J alleles, which are truncated at the 3’ end in the IMGT database. We showed that the corecount tool can be used for highly accurate personal TCR (in paper I) and BCR genotyping, including for genes that are present at lower frequency and contain variations at the end of the V gene. In paper III of this thesis, we used IgDiscover and corecount to define germline gene variations within the human heavy chain V, D and J genes in 90 donors from different population groups: African, European, South Asian, and East Asian. We also developed an additional high-throughput genomic sequencing technique, ImmuneDiscover, which enables the analysis of a much greater number of samples, for example from biobanked DNA. We verified the accuracy of this technique by comparing donor genotypes identified from expressed libraries by IgDiscover with those identified from genomic libraries using ImmuneDiscover. We further applied this method to a large sample set of 2485 donors from the 1000 Genomes Project (1KGP), belonging to 25 population groups. We identified 321 novel V alleles, seven novel D alleles and two novel J alleles. Finally, we applied the plotalleles module within the IgDiscover software to four related donors elucidating that the heavy chain locus is continuously evolving. In paper IV, we determined the allelic variation at the human light chain kappa and lambda loci from 82 donors using the IgDiscover and corecount tools. These donors belong to different population groups: African, European, South Asian, and East Asian. In total, we identified 38 and 53 novel alleles for IGKV and IGLV respectively, that are not described earlier. These alleles were confirmed through several approaches, such as targeted genomic PCR and Sanger sequencing, identification of same novel allele in more than one donor and verifying the novel allele in two independent libraries from the same donor. Additionally, we show that some novel IGKV and IGLV alleles are population-restricted, and we identified two novel J alleles, each from different African donors. Finally, we identified alleles that may have archaic ancestry. The findings reported here greatly enhance our knowledge about human TCR and BCR germline gene variation. This will facilitate future studies of human adaptive immune responses in humans from diverse population ancestries.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.