ﺑﺎﺯﮔﺸﺖ ﺑﻪ ﺻﻔﺤﻪ ﻗﺒﻠﯽ
خرید پکیج
تعداد آیتم قابل مشاهده باقیمانده : 3 مورد
نسخه الکترونیک
medimedia.ir

Principles of complex trait genetics

Principles of complex trait genetics
Literature review current through: Jan 2024.
This topic last updated: Nov 15, 2023.

INTRODUCTION — Most human genetic traits can be classified as either monogenic or complex. Monogenic traits are strongly influenced by pathogenic variation within a single gene and are recognized by their classic patterns of inheritance within families. While monogenic traits formed the basis for "classic" genetics, it has become clear that conditions for which inheritance strictly conforms to Mendelian principles are relatively rare. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)" and "Genetics: Glossary of terms".)

Complex traits are believed to result from variation within multiple genes and their interaction with behavioral and environmental factors. Complex traits do not follow readily predictable patterns of inheritance.

This topic will review the challenges related to the identification of complex trait susceptibility genes, the factors that contribute to phenotypic complexity, and current understanding of the genetic architecture of complex genetic traits.

Genetic traits with monogenic inheritance, either with a Mendelian or a non-Mendelian pattern, are discussed separately. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)

SPECTRUM OF GENETIC VARIATION

Monogenic versus complex traits — Most monogenic diseases are caused by pathogenic variants that reduce the function or stability of a single protein by altering its expression level and/or three-dimensional structure. These pathogenic variants include point mutations (single nucleotide changes that alter the amino acid sequence), insertions, or deletions in the DNA sequence that encodes the protein; or changes in the non-coding DNA that interfere with gene splicing. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)

Broad and polygenic spectrum of variation – In complex genetic traits, the spectrum of genetic variation (ie, the types of and numbers of mutations, insertions, and deletions involved in a single disease) is broad and more complicated, and it is often difficult to identify a single causal genetic change. These traits tend to be polygenic.

Further, although coding variants have been identified in complex diseases, most of the genetic variations mapped for complex genetic traits tend to be non-coding (located in non-genic, regulatory, or intronic regions) and likely act not by changing the amino acid sequence, but rather by altering one or more of the following:

The control of gene expression, such as by interfering with a transcription factor binding site

The stability of the RNA or protein product

Posttranslational modification of the protein product

This distinction between monogenic and complex traits, while useful, can be overly simplistic. On the one hand, traits that appear to be monogenic can be influenced by variation in multiple genes ("modifier genes") [1]. Conversely, complex traits can be predominantly influenced by variation in a single gene [2]. (See 'Reasons for weak genotype-phenotype correlations' below.)

Challenges to analysis – The genetic variation in complex traits is also more difficult to analyze. Several factors contribute to difficulty in the recognition of functional regulatory variation:

Regulatory variants reside in noncoding sequences of the genome. Functional sequence annotation of noncoding sequences is frequently less comprehensive than annotation for coding regions.

Numerous genetic variants have been identified that are strongly associated with a complex disease, but are often not definitively known to be related to a gene (eg, the 9p21 locus in coronary artery disease) [3,4]. The mechanisms by which such variants affect complex traits are more difficult to uncover [5].

Susceptibility loci for complex traits may influence the regulation of several genes within a genomic region [6]. This suggests that some regulatory regions of DNA, rather than affecting a single gene, could play a role in regulating the regional transcription patterns of multiple genes.

Reasons for weak genotype-phenotype correlations — Whereas Mendelian traits are characterized by a strong genotype-phenotype correlation, complex genetic traits typically exhibit weak correlations between genotype and phenotype. Data from genome-wide association studies (GWAS) of complex traits demonstrate that many disease susceptibility variants by themselves confer only a modestly increased risk (ie, relative risk ranging between 1.1 and 1.5). (See 'Approaches to identifying genetic determinants of complex traits' below.)

The relatively weak genotype-phenotype correlations for complex traits may be explained by the following hypotheses:

Regulatory variation – Changes in the non-coding (ie, regulatory) regions of genes may alter the quantity of a protein without changing its function. The effect on cellular phenotype may be more subtle than the effect of coding pathogenic variants that affect protein structure.

Polygenic effects – Most complex genetic traits are likely the result of variation within multiple genes. Any individual genetic variant might only confer a relatively small effect; the cumulative effect of multiple genetic variants would result in a much larger overall effect on the trait [7].

Gene-gene interactions – Some genetic variants may only have an effect that is contingent upon the presence of additional genetic variants at nearby or distant loci. Determining these interactions between genes presents an enormous mathematical challenge.

Gene-environment interactions – Environmental exposures (broadly defined as all non-genetic exposures) are likely to have a strong influence on the manifestation of most complex traits. These can include behavioral exposures and exposures about which the individual is unaware. The effect of a particular environmental exposure on the expression of a genetic determinant has been demonstrated for some conditions (ie, the effect of an MMP12 variant on chronic obstructive pulmonary disease [COPD] risk is dependent on cigarette smoke exposure) [8]. Determining how these gene-environment interactions are manifested also presents a substantial mathematical challenge.

Rare (uncommon) susceptibility variants – Rare (uncommon) susceptibility variants are uncommon genetic changes that may play a minor role in a disease phenotype in the population as a whole yet can have a large effect on the phenotype of single individual. These rare variants are difficult to incorporate into population-based genotype-phenotype correlations because of limited statistical power to detect their effect with usual sample size and challenges in developing models to examine their joint effects. (See 'Common disease-rare variant hypothesis' below.)

Somatic mutations – While most genetic studies have focused on inherited germline genetic mutations, there is growing evidence that somatic mutations (occurring in a person's DNA after conception, in a cell that is not a germ cell) can result in cell-specific phenotypic abnormalities that can contribute to the age-related risk of complex disease. An example is clonal hematopoiesis of indeterminate potential (CHIP) with somatic mutations in genes such as DNMT3A and TET2 increasing the risk for hematologic malignancy [9]. The risk of developing somatic mutations may be related to inherited germline gene variants [10]. (See "Clonal hematopoiesis of indeterminate potential (CHIP) and related disorders of clonal hematopoiesis".)

Epigenetic mechanisms – Epigenetic mechanisms influence the expression of a gene without changing any DNA sequence. Such mechanisms may be heritable or post-natal, and include DNA methylation, histone modification, and non-coding RNA. Environmental or behavioral exposures may lead to human disease through epigenetics. For example, cigarette smoking can induce increased DNA methylation and thus reduced expression of genes acting as tumor suppressors. Moreover, epigenetic mechanisms could mask the identification of causative genetic loci. As an example, lack of awareness of epigenetic effects on the inheritance of telomere length would have obscured the identification of the causative PARN genetic loci in familial pulmonary fibrosis kindred [11].

Polygenic risk scores — While most clinical genetic tests provide an assessment of a single gene or a small number of genes, most complex diseases are better explained by the combined effect of large numbers of genetic variants whose aggregated effect can approach that of risks seen with variants in monogenic syndromes [12]. These aggregated effects of large numbers of genetic variants are described by weighted scoring systems called polygenic risk scores (PGRs, PRRs) [13]. These scores can be used to assess individual risk in combination with other clinical information such as personal and family history, clinical findings, and disease biomarkers.

COMMON VERSUS RARE ALLELES — The polygenic nature of complex traits is widely accepted, but controversy exists regarding the likely frequency of the most important susceptibility alleles for complex traits [14,15]. The relative contribution of common variants (minor allele frequency [MAF] ≥5 percent) or rare variants (MAF ≤1 percent) as critical determinants of complex traits is unclear.

Impact on disease — The comparative impact of common or rare alleles likely depends on the complex trait of interest and whether "impact" is defined on the basis of population attributable risk (favoring common variants) or risk prediction (favoring rare variants with larger effects). Population attributable risk (PAR) refers to the proportion of subjects affected by a disease in a population that can be explained by exposure to a particular risk factor. Risk prediction relates to the likelihood that an individual with a particular risk factor will develop a disease. A classic example from epidemiology helps illustrate the differences between these two concepts:

Lung cancer risk is increased in both people who smoke and in people who have an occupational history of asbestos exposure [16]. The risk for lung cancer in an individual who worked in an asbestos factory is significantly greater than the risk for lung cancer in the average smoker. However, cigarette smoking is considerably more common than having worked in an asbestos factory. Thus, although exposure to asbestos confers a greater risk for lung cancer (better risk prediction) than a history of smoking at an individual level, reducing exposure to cigarette smoking would have a much greater impact in reducing the number of lung cancer cases than eliminating work in asbestos factories. Viewed through this framework, it is clear that there is value to identifying common and rare determinants of complex genetic traits, as they both affect the development of human diseases.

Common disease-common variant hypothesis — The common disease-common variant hypothesis posits that common genetic variants underlie susceptibility to most common traits (eg, cardiovascular disease, diabetes, or asthma) [17]. This hypothesis is rooted in population genetics theory, which states that the current human population is the result of a global expansion from a smaller founding population from sub-Saharan Africa. Thus, the current human population shares common genetic variants that were present in the founding population, and common complex traits are the result of inheriting these variants [18].

To a certain extent, the identification of nearly >71,000 highly replicable genetic loci from genome-wide association studies (GWAS) supports the common disease-common variant hypothesis [19]. However, rare variants may partly explain findings for common variants in GWAS [20] and explain some of the unexplained or "missing" heritability for most complex traits. (See 'Approaches to identifying genetic determinants of complex traits' below and "Genetic association and GWAS studies: Principles and applications", section on 'Missing heritability'.)

Common disease-rare variant hypothesis — To date, more than >71,000 common susceptibility variants have been mapped. For a given complex trait (eg, asthma), however, the variants identified explain only a small fraction of the total heritability (proportion of phenotypic variation explained by genetics). Moreover, they appear to be of limited value in predicting disease risk. As an example, while up to 80 percent of the variability in height is felt to be due to genetic factors [21,22], the combined effect of 20 genetic loci identified by GWAS explained less than 3 percent of the heritability of height [23].

Since the loci identified by an initial GWAS likely represent the loci with the greatest contribution to the heritability, it has been argued that further study of common genetic variation in increasingly larger samples is unlikely to uncover a substantial portion of the "missing heritability" of complex traits [14], which may be at least partly due to the effects of rare variants. Rare variants may be more valuable in predicting disease risk at an individual level than common variants [24-26].

Other hypotheses — As stated above, the causes of a significant proportion of the heritability of complex traits have yet to be identified. There are plausible explanations for this, including gene-by-gene or gene-by-environment interactions and epigenetic mechanisms [27]. Most GWAS performed to date have analyzed data from the perspective of identifying those associations found most frequently, thereby detecting variants with the most significant p-values but not necessarily the strongest genetic effects. Similarly, virtually all published GWAS use a univariate modeling approach that tests for association one variant at a time, ignoring possible effects of two or more variants acting in combination.

In support of the importance of this concept, studies are beginning to demonstrate that scores based on the simultaneous effect of thousands of variants may help to improve risk prediction and may account for some of the "missing heritability" [7].

The inherent complexity of common diseases suggests that consideration of these issues will be fruitful. However, the required statistical methods needed for such analyses; the well-powered large studies and computational resources needed to search for complex gene-gene and gene-environment interactions; as well as resources to integrate epigenetic and genomics data at a genome-wide scale, are only now being developed.

APPROACHES TO IDENTIFYING GENETIC DETERMINANTS OF COMPLEX TRAITS

Linkage analysis — Linkage analysis, a statistical technique that successfully identified causal genes for over 2000 monogenic diseases, has had limited success for gene mapping of complex traits such as identification of maturity onset diabetes of the young (MODY) 1, 2, and 3 for diabetes [28]. Identifying pathogenic variants responsible for complex traits has proven to be very challenging. (See "Genetics: Glossary of terms".)

Until the mid-1990s, the two main approaches to identify susceptibility variants for complex traits were positional cloning (genome-wide linkage analysis followed by fine mapping analyses of linkage and association) and candidate-gene association studies. The candidate gene approach (ie, making an educated guess about the role of a gene in a disease based on prior knowledge of the functions of that gene) has been occasionally successful (eg, identification of APOE4 for Alzheimer's disease) [29]. However, many findings have been inconsistently replicated and applications are limited to conditions with known biology.

GWAS — Genome-wide association studies (GWAS) involve the simultaneous testing of multiple genetic variants spread across the human genome for association with a particular disease or trait. Whereas candidate gene approaches require a priori knowledge about the functions of a gene, GWAS studies are a way to scan all the genes in the genome without bias. (See "Genetic association and GWAS studies: Principles and applications".)

GWAS became feasible upon compilation of high resolution linkage disequilibrium maps of common genetic variants (through the International HapMap Project) [30], and with the advent of high throughput genome-wide genotyping technologies [31]. GWAS have catalogued associations for over 100,000 SNPs, including presentation of more than 71,000 highly-replicable genetic loci for thousands of human diseases and complex traits [32].

GWAS have identified susceptibility loci with [33] and without [34] known function. In addition, loci identified by GWAS are providing surprising genetic links among seemingly unrelated complex traits. As an example, a single nucleotide polymorphism (SNP) in the transcription factor 2 gene, also known as hepatocyte nuclear factor-1-beta (HNF1B), is associated with both type 2 diabetes and prostate cancer [35]. Although much work remains to be done with regard to functional characterization of well-replicated loci, GWAS have yielded novel insights into the pathogenesis of numerous complex traits.

The hallmark of a significant finding of genetic association for a complex trait is the ability to reproduce an original finding of association in independent populations. Demonstration that a statistically significant genotype-phenotype correlation can be found in an independent population is considered the most rigorous proof of genetic association and has become a standard in the field. In contrast, all singular findings of genetic association should be considered speculative. An important caveat to this rule is that certain susceptibility loci may only be relevant to certain ethnic groups. Thus, an observation about a genotype-phenotype association should first be tested in a cohort belonging to the same ethnic group as the initial analysis [36].

Although GWAS have demonstrated success in mapping the associations between common variants and complex traits, newer techniques such as next-generation sequencing, which can be used to sequence the entire exome (referred to as whole exome sequencing [WES]) or the entire genome (referred to as whole genome sequencing [WGS]) have improved our ability to assess the contribution of rare variants to the heritability of complex traits [37]. These studies require novel methodologies (eg, aggregation of data from multiple genetic variants across a single gene or pathway) and sample collection strategies (eg, large sample sizes, pedigree-based studies, or assessments of population isolates) to improve the statistical power to detect true associations [38]. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

SUMMARY

Definition of complex traits – Most genetic traits can be classified as either monogenic or complex. Complex trait conditions are those that cannot be explained by alteration of a single gene. They are typically polygenic, with additional influences from gene-gene and gene-environment interactions, epigenetic phenomena, and unidentified rare disease susceptibility variants. The spectrum of genetic variation includes coding genes and regulatory genes. (See 'Spectrum of genetic variation' above.)

Polygenic risk scores – For complex traits, polygenic risk scores can be used to account for the contributions of different genes to the trait or phenotype. (See 'Polygenic risk scores' above.)

Contribution of different alleles – Complex genetic traits may be associated with commonly occurring allelic variants that individually have a weak effect or with rare variants that have a stronger effect. Common variants have a greater impact on population-attributable risk, while uncommon variants have a greater impact on risk prediction at the level of the individual. (See 'Common versus rare alleles' above.)

Methods for studying complex traits – Techniques to identify complex trait genes include linkage analyses, candidate gene testing, genome-wide association studies (GWAS), and next generation sequencing (NGS) of the entire exome or genome. Replication of findings in another cohort is an essential element to validate findings from an initial GWAS or NGS study. (See 'Approaches to identifying genetic determinants of complex traits' above.)

Monogenic traits – Monogenic traits are strongly influenced by pathogenic variation within a single gene and are recognized by their classic Mendelian patterns of inheritance within families. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)

  1. Gu Y, Harley IT, Henderson LB, et al. Identification of IFRD1 as a modifier gene for cystic fibrosis lung disease. Nature 2009; 458:1039.
  2. Stefansson H, Rye DB, Hicks A, et al. A genetic risk factor for periodic limb movements in sleep. N Engl J Med 2007; 357:639.
  3. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007; 447:661.
  4. Lo Sardo V, Chubukov P, Ferguson W, et al. Unveiling the Role of the Most Impactful Cardiovascular Risk Locus through Haplotype Editing. Cell 2018; 175:1796.
  5. Harismendy O, Notani D, Song X, et al. 9p21 DNA variants associated with coronary artery disease impair interferon-γ signalling response. Nature 2011; 470:264.
  6. Verlaan DJ, Berlivet S, Hunninghake GM, et al. Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ORMDL3 locus associated with the risk of asthma and autoimmune disease. Am J Hum Genet 2009; 85:377.
  7. Moll M, Sakornsakolpat P, Shrine N, et al. Chronic obstructive pulmonary disease and related phenotypes: polygenic risk scores in population-based and case-control cohorts. Lancet Respir Med 2020; 8:696.
  8. Hunninghake GM, Cho MH, Tesfaigzi Y, et al. MMP12, lung function, and COPD in high-risk populations. N Engl J Med 2009; 361:2599.
  9. Jaiswal S, Fontanillas P, Flannick J, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N Engl J Med 2014; 371:2488.
  10. Bick AG, Weinstock JS, Nandakumar SK, et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 2020; 586:763.
  11. Xing C, Garcia CK. Epigenetic inheritance of telomere length obscures identification of causative PARN locus. J Med Genet 2016; 53:356.
  12. Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 2018; 50:1219.
  13. Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat Med 2021; 27:1876.
  14. Goldstein DB. Common genetic variation and human traits. N Engl J Med 2009; 360:1696.
  15. Hirschhorn JN. Genomewide association studies--illuminating biologic pathways. N Engl J Med 2009; 360:1699.
  16. DOLL R. Mortality from lung cancer in asbestos workers. Br J Ind Med 1955; 12:81.
  17. Reich DE, Lander ES. On the allelic spectrum of human disease. Trends Genet 2001; 17:502.
  18. Cargill M, Daley GQ. Mining for SNPs: putting the common variants--common disease hypothesis to the test. Pharmacogenomics 2000; 1:27.
  19. Buniello A, MacArthur JAL, Cerezo M, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 2019; 47:D1005.
  20. Dickson SP, Wang K, Krantz I, et al. Rare variants create synthetic genome-wide associations. PLoS Biol 2010; 8:e1000294.
  21. Carmichael CM, McGue M. A cross-sectional examination of height, weight, and body mass index in adult twins. J Gerontol A Biol Sci Med Sci 1995; 50:B237.
  22. Silventoinen K, Sammalisto S, Perola M, et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res 2003; 6:399.
  23. Weedon MN, Lango H, Lindgren CM, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 2008; 40:575.
  24. Meigs JB, Shrader P, Sullivan LM, et al. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N Engl J Med 2008; 359:2208.
  25. Paynter NP, Chasman DI, Buring JE, et al. Cardiovascular disease risk prediction with and without knowledge of genetic variation at chromosome 9p21.3. Ann Intern Med 2009; 150:65.
  26. Kraft P, Hunter DJ. Genetic risk prediction--are we there yet? N Engl J Med 2009; 360:1701.
  27. Forno E, Wang T, Qi C, et al. DNA methylation in nasal epithelium, atopy, and atopic asthma in children: a genome-wide study. Lancet Respir Med 2019; 7:336.
  28. Bell GI, Polonsky KS. Diabetes mellitus and genetically programmed defects in beta-cell function. Nature 2001; 414:788.
  29. Strittmatter WJ, Saunders AM, Schmechel D, et al. Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc Natl Acad Sci U S A 1993; 90:1977.
  30. International HapMap Consortium. A haplotype map of the human genome. Nature 2005; 437:1299.
  31. Wang DG, Fan JB, Siao CJ, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 1998; 280:1077.
  32. Welter D, MacArthur J, Morales J, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014; 42:D1001.
  33. Kathiresan S, Melander O, Anevski D, et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N Engl J Med 2008; 358:1240.
  34. Duerr RH, Taylor KD, Brant SR, et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 2006; 314:1461.
  35. Gudmundsson J, Sulem P, Steinthorsdottir V, et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet 2007; 39:977.
  36. Yasuda K, Miyake K, Horikawa Y, et al. Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitus. Nat Genet 2008; 40:1092.
  37. Tennessen JA, Bigham AW, O'Connor TD, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 2012; 337:64.
  38. Panoutsopoulou K, Tachmazidou I, Zeggini E. In search of low-frequency and rare variants affecting complex traits. Hum Mol Genet 2013; 22:R16.
Topic 14600 Version 23.0

References

آیا می خواهید مدیلیب را به صفحه اصلی خود اضافه کنید؟