NGS based studies on primary immunodeficiencies (PIDs) : causative gene identification, tool development and application

Abstract: Primary immunodeficiency diseases (PIDs) are composed by a group of highly heterogeneous immune system diseases, of which approximately 350 forms of PID have been described so far. The causative gene of around 60% of patients with PIDs has yet unknown. In recent years, Next Generation Sequencing (NGS) has been increasingly adopted for gene identification and molecular diagnosis of rare diseases, including PIDs. An overview of the genetic makeup that underlies PID using NGS has been suggested as a promising approach to elucidate the etiology of PIDs, which could yield diagnostic and, possibly, provide new treatment advances for PID. To approach this goal, we performed either whole exome sequencing (WES, 454 samples) or targeted region sequencing (TRS, 217 samples) on 602 samples of 500 PID pedigrees. We have summarized the practical suggestions for the interpretation of NGS data and the techniques that can be used to search disease-causative PID genes in Paper I. This work aims to improve data annotation, interpretation, and application of NGS data in PIDs, which also facilitates a wide range of application of NGS data analysis in other Mendelian disorders. The genetic approach together with immunological investigations have identified potential pathogenic variants in 86 primary antibody deficiency (PAD) patients (68.2%), and a correct diagnosis can guide/change treatment plan in around half of the patients with PAD (Paper II). We identified potentially disease-causing variants (including variants classified as VUS (variants of unknown clinical significance)) in around 34% of genetically unidentified PID samples, which had been subjected to TRS using a panel of 219 common PID genes. Notably, the genetic diagnosis of a specific atypical ITK deficiency case adds to the growing amount of evidence supporting the importance of genetic investigations initiated at an early stage of the patient´s disease (Paper III). Altogether, around 60% of PID patients have a possible diagnosis via WES/TRS. Copy number variation defects were identified in 16 patients (4 genes were involved, LRBA, ATM, DOCK8 and PMS2). Beyond the identification of the monogenic causal gene based on pedigree analysis, mutation frequency analysis has been used to identify genes with rare functional variants in the higher proportion of patients in specific patient group compared to control samples, which have discovered several potential novel PID genes (TNFRSF18, PIK3CG, LILRB1, EPHB2, TXNIP, CD5 and NLRP5). Other possible models beyond the monogenic scenario were also explored, and 16 severe combined immunodeficiency (SCID) or common variable immunodeficiency (CVID) patients might be due to an accumulation of rare amino acid substitution variants in genes related to the same function or pathway (RAG1 & RAG2, RAG1 & ATM, C3 & ITGB2, PRKDC & ATM, C5 & NIPBL, LRBA & CR2, CR2 & NFKB1, UNC93B1 & NIPBL, PLCG2 & NOD2 and IGLL1 & ATM). These findings indicate that NGS, together with a large sample size, is powerful in decoding the genetic characteristics of PID and provide insight into molecular mechanisms that cause the disease. Existing variants impact prediction software/algorithms still have a challenge to evaluate the pathological consequences of the prioritized variants or genes. We thus developed a Random Forest-based discriminator, Variant Impact Predictor for PIDs (VIPPID), to refine the prediction algorithms, which utilized the features of pathogenic variants and benign mutations, integrated with other 24 predictive softwares currently used. Evaluation of VIPPID showed that it had superior performance (AUC=0.95) over existing tools, we also showed the gene-specific model outperformed the non-gene-specific model and provided a possibility to explore the underlying molecular mechanism based on our gene-specific model in Paper IV. Specific mutations of PID causative genes may exert different effects on TCR repertoire diversity and composition, which ultimately lead to heterogeneous phenotypes. DNA damage response/methylation is an essential process during antigen receptor recombination. To investigate the effect of mutations in DNA repair genes on adaptive immunity, 19 patients with DNA repair/methylation defects were selected and subdivided into several groups based on their causative genes, we then performed deep immune repertoire sequencing and comparison with 14 age-matched healthy controls. Patients with different molecular diagnosis exhibited distinct repertoire diversity, clonality and V-J pairing patterns. Aberrant complementarity-determining region 3 (CDR3) length distribution was observed both in unproductive and productive TCRs in all patients, suggesting that it predominantly arose before thymic selection. Shorter CDR3 lengths in AT patients resulted from a decreased number of insertions, led to an increase in the number of shared clonotypes, whereas patients with DNMT3B and ZBTB24 mutations presented longer CDR3 lengths and reduced specificity for pathogen-associated CDR3 sequences (Paper V). This study revealed the role of DNA repair/methylation machinery in patients with ATM, DNMT3B and ZBTB24 deficiency, and shed light on the mechanistic etiology of their T cell dysfunction.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.