|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TOPICAL REVIEW |
1 Department of Cardiovascular Medicine, Department of Physiology, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, USA2 Department of Psychiatry, School of Medicine, University of California San Diego, San Diego, CA, USA
| Abstract |
|---|
|
|
|---|
(Received 10 July 2003;
accepted after revision 16 October 2003;
first published online 24 October 2003)
Corresponding author U. Broeckel: Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA. Email: broeckel{at}mcw.edu
| Recombination mapping: basic principles |
|---|
|
|
|---|
The biological phenomena at the foundation of this gene discovery strategy is recombination of homologous chromosomes during meiosis (hence the labels meoitic, recombination, or, less formally but more frequently used, linkage mapping). During meiosis homologous chromosomes pair up and exchange material. The probability of a recombination event occurring between loci far apart on a single chromosome is larger than for loci closer together. Hence, alleles at loci near each other are generally inherited (or cosegregate) together, making it possible to track influential loci transmitted from generation to generation by using alleles at neighbouring loci (i.e. the neighbouring locus alleles act as surrogates for the presence of alleles at other loci). By studying inheritance patterns of a trait or disease as well as marker alleles at loci across the entire genome, one can scan the genome for loci that might influence that trait or disease in the absence of a priori information about the location of relevant genes.
Figure 1 provides a graphical representation of some of the concepts discussed. The alleles in the shaded areas are part of an ancestral chromosomal segment harbouring a mutation that has been transmitted to individuals in ensuing generations. Because of recombination events occurring within the genealogical links connecting the common ancestor who introduced the mutation into the pedigree to the two nuclear families representing individuals in the latest generations, the size of the common ancestral chromosomal segment (or haplotype) harbouring the mutation that is possessed by all individuals receiving the mutation is reduced (i.e. the reduced shaded region). Thus, the alleles in the boxed areas cosegregate with the mutation within the two families in the latest generation, but only the alleles within the shaded areas cosegregate with the mutation across the families. This distinction is important for putting recombination-based mapping strategies into specific contexts.
|
The technical and mathematical details of recombination mapping are beyond the scope of this review (e.g. non-parametric methods versus parametric methods; variance components models, etc.). We encourage the interested reader to invest in the books by Ott (1999) and/or Rao & Province (1999) for details. However, there is a broad distinction between methods that exploit within family associations (as discussed above), which are generally associated with standard linkage mapping studies, and those that exploit across family associations, which are generally referred to as linkage disequilibrium mapping studies. The distinction carries with it some practical consequences: the size of the chromosomal segments that are shared among individuals within a family that have been transmitted a mutation are larger than those shared across members of different families, due to the greater number of meiotic events that separate individuals in those different families. Thus, one may need a denser set of markers to assess across family associations (i.e. to identify the small shared chromosomal segments and alleles at loci that neighbour the site of a mutation that influences a trait or disease) than is needed to assess within family associations. These concepts will be taken up later in the discussion on the motivation for the haplotype map initiative and the section discussing study populations.
Heritability: genes at work
Before embarking on a mapping study to identify individual genes that might influence a trait or disease (i.e. phenotype), it is important to assess whether or not a trait has an obvious genetic basis. Although this might sound like a trivial exercise, it is important to evaluate a phenotype critically as a first step in estimating the likelihood of success for a genetic mapping project. A number of methods have been proposed to obtain information about the overall genetic makeup of a disease or phenotype. For example, estimating the concordance of a phenotype between twin pairs can provide evidence for the genetic basis of a phenotype, as the comparison of the frequency of the trait among monozygotic twins (who share all their genes) and dizygotic twins (who share on average half of their genes) allows estimation and distinction of the overall contribution of genetic and shared environmental factors to the phenotype (MacGregor et al. 2000). Twin studies have been utilized over many years and have demonstrated, for example, that genetic factors contribute significantly to cardiovascular diseases such as hypertension, diabetes, or coronary artery disease (Marenberg et al. 1994; McCaffery et al. 1999; Hyttinen et al. 2003). Familial clustering of a phenotype, or the resemblance of relatives of any degree (e.g. siblings, cousins, etc.) with regard to a phenotype can provide further evidence that genetic factors contribute to a phenotype. One has to keep in mind that, depending on the phenotype in question, this type of analysis has to be interpreted cautiously as familial resemblance can be caused by shared familial environmental factors, rather than overt, inherited genetic factors (Guo, 1998).
| Overtly Mendelian versus polygenic (or complex) phenotypes |
|---|
|
|
|---|
As can be expected, while there has been great success in identifying genes for simple Mendelian phenotypes using recombination mapping strategies, success in identifying genes contributing to complex phenotypes has not been as forthcoming. The identification of the genes contributing to complex phenotypes is made especially difficult by the fact that the contribution of any one gene or genetic variant to the phenotype might be obscured or confounded by the others. Thus, any one genetic variant might not show strict coinheritance with the phenotype in a family or pedigree, since different family members can express the disease for different reasons.
Phenotypic definition
One way to enhance or facilitate the identification of the genetic determinants of a phenotype is to refine the definition of the phenotype. Thus, misdiagnoses, ignoring important disease sequelae and aetiological factors, etc. will simply corrupt a gene mapping effort. This fact is basically accepted and is often used as an excuse for the failure of gene mapping studies (see, e.g. Risch & Botstein, 1996). Frequently, the selection of individuals with the phenotype is typically based on previously established criteria. For example in studies aiming to identify genes for hypertension, blood pressure criteria of >140/90 mmHg is often used to identify affected individuals. This definition is based on epidemiological studies showing that individuals with blood pressure higher than the proposed cut-off have an increased risk for cardiovascular disease (Kannel, 2000). However, looking back, these diagnostic criteria change considerably over time as new clinical and pathophysiological insights into the nature of hypertension have been obtained. In fact, it is now known that there are, for example, many subtypes of hypertension, each with its own unique genetic determinants (Lifton, 1993).
One option to deal with phenotypic definition issues is to measure phenotypes, which are assumed to be functionally related or potentially underlying a broader phenotype (such as a disease). Thus, measuring intermediate phenotypes or endophenotypes is widely used (such as catecholamine levels in hypertension, or cholesterol fractions in atherosclerosis), as they may have stronger genetic determinants than the more remote phenotype they relate to.
It also should be noted that in many cases the definition of a disease dichotomizes what is actually a quantitative or continuous physiological measurement (e.g. hypertension status dichomotomizes blood pressure level). While for clinical purposes this may be useful, in genetic studies it may reduce important variation and hence reduce the underlying genetic signals (Korczak & Goldstein, 1997). As a gene might affect not only one phenotype, but rather an entire network of correlated biological systems, the joint analysis of multiple phenotypes can also increase the power to detect genes (Stoll et al. 2001). Finally, it should be noted that with every measurement and definition of a disease, there is an associated measurement error or diagnostic inaccuracy. While it is difficult to estimate the effect of this phenomenon precisely, measurement errors obviously reduce the power of a mapping experiment, as does misclassification. The importance of a clear definition of the phenotype should not be underestimated, as it will critically influence the power of any genetic study.
The case for studying isolated populations
To conduct a mapping study, one needs either families or individuals whose genealogical links are either known or can be inferred. In this light it is important to keep in mind that the selection of the study population or set of individuals to contrast is as important as defining the phenotype. For complex trait gene mapping analysis, it has been argued that populations that are relatively young and isolated have advantages. The reasons for this are intuitive. First, there is likely to be greater environmental homogeneity among the people in those populations (e.g. rural China, Iceland, etc.), thus minimizing the number of environmental factors that influence a phenotype and thereby dampen relevant genetic effects. Second, there is likely to be greater genetic homogeneity. That is, such populations may have less genetic variation in them (i.e. have a more restricted gene pool) that influence the trait in question. Third, because younger populations are likely to have fewer meiotic events separating the individuals in them, the size of the chromosomal segments that are shared by individuals with the same phenotype because those segments harbour variants that influence the expression of that trait are likely to be larger. While a number of different events such as bottlenecks or the rate of population growth shape as well the extend of linkage disequilibrium (or non-random association of alleles at neighbouring loci and the alleles' ultimate coinheritance), there is support for the notion that the linkage disquilibrium is stronger in these populations (Jorde et al. 2001).
In some instances the family structure or genealogical relationships of these population is also available, making genetic studies that much easier (see, e.g. the example of the Hutterites; Newman et al. 2001). There are also obvious limitations in the use of such populations. For example it might not be possible to study many diseases in the population because of their low frequencies and the limited number of individuals in those populations. In addition, if a gene has been localized or identified in one population, this gene might not necessarily play a role in other populations. Overall, isolated populations represent exceptional situations and in most cases the study population is chosen from non-isolated populations.
Genetic maps and the haplotype map initiative
To facilitate gene mapping efforts based on recombination, genetic researchers are constantly developing resources and assays for the identification and use of polymorphic markers for assessing cosegregation phenomena. Knowledge of the frequency of recombination between loci that populate the genome is crucial in mapping efforts and has led to the development of complimentary genetic maps (which chart recombination frequencies) to physical maps (which merely tally the number of bases or nucleotides separating loci) (Lander & Weinberg, 2000). Virtually all of the material relevant to genetic and physical maps that has been gathered by geneticists has been put in the public domain in an almost unprecedented attempt to collaborate and facilitate gene discovery efforts (see, e.g. Locus Link website, dbSNP, SNP Consortium website, etc.). Of course, the release of drafts of the entire DNA sequence of humans has greatly enhanced the ability to identify polymorphic sites and assess recombination (Lander et al. 2001; Venter et al. 2001).
One particularly interesting initiative aimed at facilitating gene discovery via recombination mapping is the Human Haplotype Map Initiative (Couzin, 2002). As noted above, examining evidence for shared chromosomal segments or haplotypes across individuals in different families is complicated by the fact that, in the absence of knowledge of how many meiotic events separate relevant individuals, it is difficult to predict just how large the common segments might be (i.e. how recombination may have reduced the size of an ancestral chromosomal segment harbouring a trait-influencing mutation). The Hap-Map Initiative (as it is known) is a large-scale effort to determine, empirically, the sizes and block-like structure of common, apparently conserved, chromosomal segments across any set of (arbitrarily related) individuals. With such knowledge of block sizes, one could assess evidence for across family associations between DNA markers and, e.g. a disease, and potentially not have to collect family materials, since the information about the genealogical links between individuals would merely be reflected in the conserved haplotypes. Thus, ideally, one could walk through the entire genome and merely tally how often, e.g. diseased and non-diseased, individuals carry a certain chromosomal segment or haplotype. Those haplotypes that show pronounced differential frequency between diseased and non-diseased individuals likely harbour the relevant disease-influencing mutations. Although a great oversimplification, this description captures the main elements of linkage disequilibrium mapping and the motivation for the Hap-Map Initiative.
Modern sequence and functional analysis
Ultimately, once a marker or a set of markers are identified that appear to have alleles that cosegregate with a phenotype, an actual assessment of variations in the DNA sequence in the relevant region of the genome is necessary. While in some cases a single variation in a particular gene might causally influence a phenotype, in some instances a number of different variants in the same gene or its regulatory elements might influence a phenotype (Loots et al. 2000; Rioux et al. 2001). It is therefore imperative to survey all polymorphic sites in a genomic region and assess their putative functionality. As sequencing still remains expensive, and there exist technological constraints with very high-throughput sequencing, this task can be laborious.
Ultimately, the proof that a particular genetic variant impacts a phenotype requires the analysis of that variation in ways that go well beyond mapping. For example, in vitro assays may be appropriate for assessing the effect of polymorphisms in regulatory regions via cell culture-based transfection studies that allow indirect measurement of transcriptional activation (Trinklein et al. 2003). In addition animal models can be used (e.g. knocking out gene or inserting a gene) to assess the phenotypic effect of certain variants.
| Conclusions |
|---|
|
|
|---|
Ultimately, the knowledge and understanding of how genes and genetic variation influence complex phenotypes will change not only the way we might diagnose and treat disease, but it will also give us an unprecedented view on inheritance and the history of human species.
| References |
|---|
|
|
|---|
Collins
A
&
Morton
NE (1998). Mapping a disease locus by allelic association. Proc Natl Acad Sci U S A
95, 17411745.
Collins FS, Green ED, Guttmacher AE & Guyer MS (2003). A vision for the future of genomics research. Nature 422, 835847.[CrossRef][Medline]
Couzin J (2002). Human genome. HapMap launched with pledges of $100 million. Science 298, 941942.[CrossRef][Medline]
Guo SW (1998). Inflation of sibling recurrence-risk ratio, due to ascertainment bias and/or overreporting. Am J Hum Genet 63, 252258.[CrossRef][Medline]
Horikawa Y, Oda N, Cox NJ Li X, Orho-Melander M, Hara M, Hinokio Y, Lindner TH, Mashima H, Schwarz PE et al. (2000). Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat Genet 26, 163175.[CrossRef][Medline]
Hugot JP, Chamaillard M, Zouali H et al. (2001). Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599603.[CrossRef][Medline]
Hyttinen
V, Kaprio
J, Kinnunen
L, Koskenvuo
M
&
Tuomilehto
J (2003). Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs: a nationwide follow-up study. Diabetes
52, 10521055.
Jorde
LB, Watkins
WS
&
Bamshad
MJ (2001). Population genomics: a bridge from evolutionary history to genetic medicine. Hum Mol Genet
10, 21992207.
Kannel WB (2000). Elevated systolic blood pressure as a cardiovascular risk factor. Am J Cardiol 85, 251255.[CrossRef][Medline]
Korczak JF & Goldstein AM (1997). Sib-pair linkage analyses of nuclear family data: quantitative versus dichotomous disease classification. Genet Epidemiol 14, 827832.[CrossRef][Medline]
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860921.[CrossRef][Medline]
Lander
ES
&
Schork
NJ (1994). Genetic dissection of complex traits. Science
265, 20372048.
Lander
ES
&
Weinberg
RA (2000). Genomics: journey to the center of biology. Science
287, 17771782.
Lifton RP (1993). Genetic factors in hypertension. Curr Opin Nephrol Hypertens 2, 258264.[Medline]
Lifton RP & Jeunemaitre X (1993). Finding genes that cause human hypertension. J Hypertens 11, 231236.[CrossRef][Medline]
Loots
GG, Locksley
RM, Blankespoor
CM
et al. (2000). Identification of a coordinate regulator of interleukins 4, 13, and 5 by crossspecies sequence comparisons. Science
288, 136140.
MacGregor AJ, Snieder H, Schork NJ & Spector TD (2000). Twins. Novel uses to study complex traits and genetic diseases. Trends Genet 16, 131134.[CrossRef][Medline]
Marenberg
ME, Risch
N, Berkman
LF, Floderus
B
&
de Faire
U (1994). Genetic susceptibility to death from coronary heart disease in a study of twins. N Engl J Med
330, 10411046.
McCaffery JM, Pogue-Geile MF, Debski TT & Manuck SB (1999). Genetic and environmental causes of covariation among blood pressure, body mass and serum lipids during young adulthood: a twin study. J Hypertens 17, 16771685.[CrossRef][Medline]
Newman DL, Abney M, McPeek MS, Ober C & Cox NJ (2001). The importance of genealogy in determining genetic associations with complex traits. Am J Hum Genet 69, 11461148.[CrossRef][Medline]
Ogura Y, Bonen DK, Inohara N et al. (2001). A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603606.[CrossRef][Medline]
Ott J, (1999). Analysis of Human Genetic Linkage. Johns Hopkins University Press, Baltimore.
Rao D & Province MA (1999). Genetic Disection of Complex Traits. Advances in Genetics. Academic Press.
Rioux JD, Daly MJ, Silverberg MS, Lindblad K, Steinhart H, Cohen Z, Delmonte T, Kocher K, Miller K, Guschwan S et al. (2001). Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat Genet 29, 223228.[CrossRef][Medline]
Risch N & Botstein D (1996). A manic depressive history. Nat Genet 12, 351353.[CrossRef][Medline]
Schellenberg GD, D'Souza I & Poorkaj P (2000). The genetics of Alzheimer's disease. Curr Psychiatry Rep 2, 158164.[Medline]
Stoll
M, Cowley
AW
Jr, Tonellato
PJ, Greene
AS, Kaldunski
ML, Roman
RJ, Dumas
P, Schork
NJ, Wang
Z
&
Jacob
HJ (2001). A genomic-systems biology map for cardiovascular function. Science
294, 17231726.
Trinklein
ND, Aldred
SJ, Saldanha
AJ
&
Myers
RM (2003). Identification and functional analysis of human transcriptional promoters. Genome Res
13, 308312.
Venter
JC, Adams
MD, Myers
EW, Li
PW, Mural
RJ, Sutton
GG, Smith
HO, Yandell
M, Evans
CA, Holt
RA
et al. (2001). The sequence of the human genome. Science
291, 13041351.
| Acknowledgements |
|---|
This article has been cited by other articles:
![]() |
K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabasi The human disease network PNAS, May 22, 2007; 104(21): 8685 - 8690. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S Hansen, T. H Brix, I. Iachine, K. O Kyvik, and L. Hegedus The relative importance of genetic and environmental effects for the early stages of thyroid autoimmunity: a study of healthy Danish twins Eur. J. Endocrinol., January 1, 2006; 154(1): 29 - 38. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |