What is subclade testing?
Subclade testing can provide increased resolution of your placement on the Y-chromosome phylogenetic tree. Before your subclade can be determined, you must first know what haplogroup you belong to. Haplogroups are defined by unique mutation events such as single nucleotide polymorphisms, or SNPs. These SNPs mark the branch of a haplogroup, and indicate that all descendents of that haplogroup at one time shared a common ancestor. The Y-DNA SNP mutation has been passed from father to son over thousands of years. Over time, additional SNPs may occur within a haplogroup, leading to a new lineage. These new lineages are considered subclades of the haplogroup. Each time a new mutation occurs, there is a new branch in the haplogroup, and therefore a new subclade. By testing for the presence of SNPs that have been identified as being indicative of a subclade within a haplogroup, you can now determine which specific subclade you belong to within your previously determined haplogroup.
There seems to be little consensus regarding the origin of Haplogroup G. According to some theories, the defining SNP for haplogroup G is thought to have arisen approximately 30,000 years before present (BP), likely along the eastern edge of the Middle East and perhaps as far east as the Himalayan foothills in Pakistan or India (Spencer-Wells et al 2001). There is an alternative theory (Cinnioglu et al 2004) that the G haplogroup arose only 14,300 years BP in the Middle East, while Semino et al. (2000) estimates the haplogroup arose 17,000 years ago.
Estimation of the time of origin of a unique mutation event, or time to most recent common ancestor (tMRCA), is based on the number of Y-STR (short tandem repeat) markers that separate men with different haplogroups and the estimated mutation rate (on average, how many years pass by between mutation events). However, these estimates are not always precise and may have large margins of error. To help support or disprove a proposed time of origin for a haplogroup, ancient skeleton remains can be tested for the presence of SNPs that define particular haplogroups. If SNPs indicative of a specific haplogroup are found to occur in the skeleton, it means that haplogroup is at least as old as the ancient bones. As data accumulate, the estimated time of origin for Haplogroup G will no doubt become more accurate.
Regardless of the exact origin time of Haplogroup G, scientists have been able to reconstruct some of the history (Figure 1). Approximately 60,000 years ago, a lineage derived from “Y-chromosomal Adam”, with SNP mutation M168, started to disperse out of Africa as they followed the migration of animal herds that they depended on for their survival. Sometime during this migration (approximately 45,000 years ago), another unique event mutation occurred, marked as SNP M89. On average, 90-95% of non-Africans have this mutation. This lineage continued their migration into the Middle East. The M201 mutation (and ten other associated SNP markers), and therefore Haplogroup G, arose from this M89 lineage (Figure 2). Around the end of the last ice age (10,000 to 15,000 BP), Haplogroup G ancestors lived within the Fertile Crescent and began to grow their own food. This was a major event in human history and marked the beginning of the Neolithic Revolution when people stopped becoming nomadic hunters and instead became farmers. As you can imagine, this period saw a rapid alteration in the social structure of human societies. Having a constant supply of food allowed populations to grow larger, and eventually led to the dispersion of large groups of migrants out of the Fertile Crescent, along the Mediterranean, through Turkey and the Balkans, into the Caucasus Mountains. Even if the SNP mutations that define Haplogroup G arose several thousand years before this era, nevertheless, genealogical studies have implicated Haplogroup G as an important genetic history of the Neolithic age and the expansion of agriculture from the Middle East to Europe over 9,000 years ago (King and Underhill 2002).
Figure 1. The origin of Haplogroup G (defined by M201, plus ten other markers). Most studies agree that Haplogroup G likely arose within the Indus Valley of the Middle East and is an important genetic marker of the spread of agriculture during the Neolithic era.
Figure 2. The phylogenetic tree of the 20 known Y-DNA haplogroups. Haplogroup
G is circled in blue to indicate its relative position within the tree.
Geographical Distribution of Y-DNA Haplogroup G
The distribution of Haplogroup G (Figure 3 and Table 1), similar to Haplogroups J and E, tends to decrease in frequency from the Near East to Europe, with higher frequencies detected along the Mediterranean coast relative to central Europe (Semino et al 2000). Haplogroup G tends to have a relatively low frequency in most populations, but is widely distributed throughout western European countries and at lower rates across Asia and northern Africa. It is most frequent in Georgia, Armenia, Azerbaijan, and south Russia, an area collectively referred to as Caucasia, or the Caucasus. Ancestors with the G haplogroup settled in an area southeast of the Caucasus Mountains while other populations became established throughout Europe after the glaciers receded. Eventually the men carrying the SNPs for Haplogroup G migrated into other areas, including eastward into the Indian subcontinent and westward into Europe. This may have occurred by invasion, capture as slaves, or other movements associated with the spread of agriculture.
There is actually relatively little genetic variation among the entire human population despite their widespread distribution and large population size. The variation that is present is often distributed according to geography. One of the major factors causing this pattern is the presence of geographical barriers such as mountain ranges that may have prevented or limited migration between different groups of people. Since the correlation between geography and genetic variation is so prevalent, it is interesting to discover examples where genetic variation may be explained by social structure, such as language, ethnicity or race. Haplogroup G provides some possible examples of this phenomenon, not only with the haplogroup as a whole but particularly with its subclades. For example, in Lebanon the variation among subpopulations of Christians, Muslims and Druze is greater than the variation among geographically separated subpopulations. 0f the 6.5% of men classified as Haplogroup G, most were Christians or Muslims with only a very low proportion detected within the Druze (Zalloua et al 2008). This example shows how Y-DNA variation in Lebanon is more strongly structured by religion than geography. Most studies, though, tend to find that genetic diversity is more strongly structured by geography (Nasidze et al. 2004, 2008, for example).
Haplogroup G in the Caucasus
The Caucasus region includes Georgia, Armenia, Azerbaijan, and southwest Russia, and shows the highest prevalence of Haplogroup G. Due to its geographical location, the Caucasus region has served as a corridor of human migration between continents. The geography of the Caucasus also includes many natural barriers and therefore the region has a history of isolation and gene flow leading to a varied mixture of religions and languages within the region. During the migration of human populations through the Caucasus, the Caucasia Mountains influenced the direction of migration and forced most travelers into the easily accessible lowlands. However, some populations did settle in the highlands, and due to their social customs they have remained quite distinct from lowland populations. The highland men remained on the land of their forefathers and brought their wives to the area to live with them. A study of five isolated populations in Daghestan (two lowland and three highland; Marchani et al 2008) indicated that the highland populations do indeed have lower Y-chromosomal diversity due to their social practices. Haplogroup comparisons also indicated a Near Eastern origin of the highland populations since these three populations had a high incidence of Haplogroup F (including Haplogroup G) that is also found at a high rate in Near Eastern populations. The lowland populations presented evidence of influence from Near Eastern and Central Asian populations.
Haplogroup G in the Middle East
The G haplogroup has been useful in providing insight to population history in Iran. Within this country there is an interesting mix of populations. In particular, there are two groups that are geographical neighbors, but speak languages belonging to two different linguistic families. These include Semitic-speaking Iranian Arabs and Indo-European-speaking Bakhtiari. Interestingly, one study (Nasidze et al 2008) found that Haplogroup G was one of the top four most common haplogroups in Bakhtiari men and was found in 15% of the men tested whereas within the Iranian Arabs, Haplogroup G was detected in 6% of the men. However, further analysis of Y-STR haplotypes of men belonging to Haplogroups G and F, in combination with data from mitochondrial DNA, indicated that language did not act as a barrier to gene flow between the two groups. Patterns of genetic variation in this study were better correlated with geography than language.
Haplogroup G in Europe
Haplogroup G, and in particular subclade G2a, is thought to be a good genetic marker for reconstructing the spread of agriculture into Europe during the Neolithic period (Behar et al 2004). In Croatia, the frequency of G is quite low with the exception of one island distant from the mainland that represented the southern-most location in the study (Baraç et al 2003). This observance led the authors to suggest that the Neolithic spread may have occurred by sea instead of by land. Further studies utilizing enhanced resolution of Haplogroup G and its subclades will help provide more details about the rapid expansion and alteration of the human population during the advance of farming.
Figure 3. Geographical distribution of Haplogroup G. The black area of the pie charts indicates the detected frequency of Haplogroup G in that geographical area. This haplogroup has also been detected in 2.6% of North American Caucasians and 6.1% of Cubans, although these data are not indicated on the map. Table 1 lists the exact frequencies and corresponding region, country, and/or population.
Table 1. Worldwide frequencies of Haplogroup G from the literature.
The subclades of Y-DNA Haplogroup G
Current data suggest that there are at least 19 distinct lineages, or subclades, within Haplogroup G. These subclades are smaller lineages derived from two Haplogroup G subclades: G1 and G2. There is also a subclade called G*, but this is often referred to as a paragroup with the expectation that there are likely branches within this subclade that have yet to be determined. G1 is further divided into three subclades: G1*, G1a, and G1b. G2 is definitely more complex and can first be separated into four lineages: G2*, G2a, G2b, and G2c. There have been many distinct subclades detected within G2a, and they can be visualized in more detail in the “Phylogenetic Tree” section of the results.
It is important to point out that the names and relative positions of the subclades within Haplogroup G have recently been altered (in 2007/2008). The SNPs remain definitive of the same branches, and it is the branch itself that is renamed or moved. There are always new SNPs being discovered, and as these SNPs are discovered they help to increase resolution of the Y-chromosome SNP tree. Below is a list of the former subclade designations with their new names and defining SNP mutations (Table 2). This list will help immensely when referring to previous genealogical studies. Since many of the SNPs discussed here are so new, little information is known about many of the subclades. However, some information about the subclades has been summarized in Table 3, and will be further updated as new data arrive.
Further insight from unique haplotypes
Additional insight about population history can often be reconstructed by further analysis to look for unique Y-STR haplotypes that may be indicative of specific subclades. For example, Cinnioglu et al. (2000) found that phylogeographic patterns in Haplogroup G could only be detected after Y-DNA STR haplotype analysis. Haplogroup G has repeat values on several Y-STR markers that are quite distinct from other haplogroups. These include DYS425, DYS452, DYS446, and DYF399S1 (Goff and Athey 2006). DYS425=14 was found to be strongly correlated with Haplogroup G with approximately 88% of men in this haplogroup having 14 repeats at this marker, whereas this result was very rare in other haplogroups. However, one individual from subclade G2 was tested and found to be DYS425=12. This man was from a tribal area of India and it is likely that his lineage separated from the lineage leading to the European G2. The number of repeats for DYS452 tends to be smaller in subclade G2 (ranging from 25 to 28 repeats) relative to other haplogroups (28 to 33 repeats). Again, this observation seems to be unique to G2 with four individuals from G having 31 repeats suggesting that a deletion event occurred in an individual within G2, or else was present in the founder of G2. For DYS446, the allele frequency distribution is quite different depending on the value of another marker, DYS388. This indicates the presence of subclades within G2, one with a modal value of 12 at DYS388 and another with a value of 13. Overall, the number of repeats for DYS446 tends to be greater in G2 but there is a lot of overlap with other haplogroups. The shortest allele of the marker DYG399S1 has a small number of whole repeats within Haplogroup G and a unique fractional repeat value only detected in this haplogroup.
One interesting subclade, G2c, so far seems to be restricted to the Ashkenazi Jewish population (Behar et al 2004). In fact, the distribution of G2c throughout Eastern Europe tends to correspond well with the migration and settlement of the Ashkenazi Jewish population during the 16th and 17th centuries. This subclade seems to have the following modal haplotype: DYS019=15, DYS388=12, DYS389i=14, DYS389ii=18, DYS391=10, DYS392=11, DYS393=13, DYS426=11, DYS439=15 (Behar et al 2004). In this study, 14 of 34 men had the modal haplotype with the other results only a single mutation away. In addition, it seems that this subclade also has a null value for DYS425.
Table 2. There have been recent name changes to a number of subclades within Haplogroup G. The previous subclade names with the corresponding new names and defining SNP mutations are provided below.
Table 3. Summary of information currently known about the subclades of Haplogroup G. Information will be updated as it becomes available.
How the Subclades of Y-DNA Haplogroup G are determined1. Obtain a Y-DNA haplogroup predication based on the results from a Y-DNA STR test.
2. Confirm your haplogroup with a Y-DNA Haplogroup Backbone SNP test. You should be positive for M201, the SNP that is used to confirm Haplogroup G in the Y-DNA Haplogroup Backbone SNP Test panel.
3. Once your haplogroup has been confirmed as G, you can then obtain the Y-DNA Haplogroup G Subclade Test. The table below (Table 4) provides a list of the 11 SNP markers used in this panel, including the location of the SNP, the specific mutation, and the subclade that is defined by each SNP.
4. Identify the location of your SNPs on the phylogenetic tree to determine your subclade. Refer to Figure 4a and 4b for a step-by-step guide to help you locate your subclade.
Table 4. List of the SNP markers used in the Y-DNA Haplogroup G Subclade Test Kit.
Figure 4a. Once you have you have the results of your SNP test, you can then follow this step-by-step flow chart to determine your subclade. To begin, refer to the decision indicated with the red circle. Do you have SNP mutation M201? This should be a “yes”, otherwise you are not part of Haplogroup G. Next, determine if you have SNP M285. If you do, you are part of Subclade G1. If you have SNP mutation P20, you are part of Subclade G1a. If you do not have P20, check to see if you have P76. A positive result for P76 indicates that you are part of G1b, while a lack of P76 indicates that you are part of the G1* lineage. Now, let’s take a step back to M285. If you lack this mutation, you are either part of Subclade G2, or G*. A positive result for P287 indicates that you are part of G2. Please refer to the next figure for further resolution of this subclade. A negative result for P287 places you into G*.
Figure 4b. Subclade G2 contains many deeper lineages. Again, start at the decision indicated with the red circle. Do you have SNP mutation P15? First, let’s follow the path if you are positive for P15. This mutation places you within Subclade G2a. If you are also positive for M286, you fall within Subclade G2a2. If you do not have SNP mutation M286, check if you have P16. If you do not, you are part of the G2a* lineage. If you have P16, you are part of G2a1. Presence of SNP mutation P18 places you into G2a1a while lack of this mutation means you are part of the G2a1* lineage. Now, let’s return to the SNP P15. If you lack this mutation, you will fall into G2b, G2c or G2*. A positive result for M287 places you within the G2b lineage, whereas presence of M377 indicates you are part of the G2c lineage. A lack of both of these mutations indicates your classification into G2*.
Geographical Distribution of the Subclades of Y-DNA Haplogroup GMost of the genealogical analyses to date are based on Haplogroup G only, or the major subclades such as G2-P15, and therefore information on rare or newly defined subclades is somewhat limited. However, there have been some recent studies that have included a more detailed analysis of the subclades, particularly by testing for P15, the SNP that defines subclade G2. The relative proportion of the different subclades is illustrated in Figure 5, and Table 5 lists the subclade frequencies and geographical location of populations that have been tested so far.
Figure 5. Relative frequency distribution of the subclades of Haplogroup G. The pie charts indicate the relative contribution of the different subclades in geographical areas where Haplogroup G has been detected. The G2c subclade (indicated in orange) is prevalent only in the Ashkenazi Jewish population, and G1 seems to be most predominent in Iran and the United Arab Emirates. G2, and in particular G2a, is clearly the most predominant subclade throughout Europe.
Table 5. Frequency distribution of the subclades of Haplogroup G detected worldwide.
Phylogenetic Tree for the Subclades of Y-DNA Haplogroup G
The phylogenetic tree of the subclades of Y-DNA Haplogroup G is illustrated below (Figure 6). It is current as of August 2008, but this haplogroup seems to be in a state of rapid change due to discovery of new SNPs and rearrangement of phylogenetic relationships, so it is likely that this tree will be altered in the future.
Figure 6. The phylogenetic tree of the subclades of Haplogroup G, current as of August 2008. As the figure legend indicates, the markers in white are those markers included in the Y-DNA Haplgroup G Subclade Test Panel, whereas the markers in red have not yet published, and are therefore not included in the panel.
ResourcesAlonso et al. (2005) The place of the Basques in the European Y-chromosome diversity landscape. European Journal of Human Genetics 13:1293-1302.
Bamshad et al. (2001) Genetic evidence on the origins of Indian caste populations. Genome Research 11:994-1004.
Baraç et al. (2003) Y-chromosomal heritage of Croatian population and its island isolates. European Journal of Human Genetics 11:535-542.
Behar et al. (2004) Contrasting patterns of Y chromosome variation in Ashkenazi Jewish and host non-Jewish European populations. Human Genetics 114:354-365.
Cadenas et al. (2008) Y-chromosome diversity characterizes the Gulf of Oman. European Journal of Human Genetics 16:374-386.
Capelli et al. (2007) Y-chromosome genetic variation in the Italian peninsula is clinal and supports an admixture model for the Mesolithic-Neolithic encounter. Molecular Phylogenetics and Evolution 44:228-239.
Cinnioglu et al. (2004) Excavating Y-chromosome haplotype in Anatolia. Human Genetics 114:127-148.
Cordaux et al. (2004) Independent origins of Indian caste and tribal paternal lineages. Current Biology 14:231-235.
Cruciani et al. (2002) A back migration from Asia to sub-Saharan Africa is supported by high-resolution analysis of human Y-chromosome haplotypes. American Journal of Human Genetics 70:1197-1214.
Czányi et al. (2008) Y-chromosome analysis of ancient Hungarian and two modern Hungarian-speaking populations from the Carpathian Basin. Annals of Human Genetics 72:519-534.
Di Giacomo et al. (2003) Clinal patterns of human Y chromosomal diversity in continental Italy and Greece are dominated by drift and founder effects. Molecular Phylogenetics and Evolution 28:387-395.
Fechner et al. (2008) Boundaries and clines in the West Eurasian Y-chromosome landscape: Insights from the European part of Russia. American Journal of Physical Anthropology 137:41-47.
Firasat et al. (2007) Y-chromosomal evidence for a limited Greek contribution to the Pathan population of Pakistan. European Journal of Human Genetics 15:121-126.
Goff and Athey (2006) Diagnostic Y-STR markers in Haplogroup G. Journal of Genetic Genealogy 2:12-17.
Hammer et al. (2000) Jewish and Middle Eastern non-Jewish populations share a common pool of Y-chromosome biallelic haplotypes. Proceedings of the National Academy of Science 97:6769-6774.
Karafet et al. (2008) New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Research DOI: 10.1101/gr.7172008
Karlsson et al. (2006) Y-chromosome diversity in Sweden – A long-time perspective. European Journal of Human Genetics 14:963-970.
King et al. (2008) Differential Y-chromosome Anatolian influences on the Greek and Cretan Neolithic. Annals of Human Genetics 72:205-214.
King and Underhill (2002) Congruent distributions of Neolithic painted pottery and ceramic figurines with Y-chromosome lineages. Antiquity 76:707–14.
Luca et al. (2007) Y-chromosomal variation in the Czech Republic. American Journal of Physical Anthropology 132:132-139.
Luis et al. (2004) The Levant versus the Horn of Africa: Evidence for bidirectional corridors of human migration. The American Journal of Human Genetics 74:532-544.
Marchani et al. (2008) Culture creates genetic structure in the Caucasus: Autosomal, mitochondrial, and Y-chromosomal variation in Daghestan. BMC Genetics 9:47
Marjanovic et al. (2005) The peopling of modern Bosnia-Herzegovina: Y-chromosome haplogroups in the three main ethnic groups. Annals of Human Genetics 69:1-7.
Mendizabal et al. (2008) Genetic origin, admixture, and asymmetry in maternal and paternal human lineages in Cuba. BMC Evolutionary Biology 8:213
Mohyuddin et al. (2006) Detection of novel Y SNPs provides further insights into Y chromosomal variation in Pakistan. Journal of Human Genetics 51:375-378.
Nasidze et al. (2004) Mitochondrial DNA and Y-chromosome variation in the Caucasus. Annals of Human Genetics 68:205-221.
Nasidze et al. (2005) MtDNA and Y-chromosome variation in Kurdish groups. Annals of Human Genetics 69:401-412.
Nasidze et al. (2008a) mtDNA and Y-chromosome variation in the Talysh of Iran and Azerbaijan. American Journal of Physical Anthropology 000:000-000.
Nasidze et al. (2008b) Close genetic relationship between Semitic-speaking and Indo-European-speaking groups in Iran. Annals of Human Genetics 72:241-252.
Nasidze and Stoneking (2001) Mitochondrial DNA variation and language replacements in the Caucasus. Proceedings of Biological Science 268:1197-1206.
Nebel et al. (2001) The Y-chromosome pool of Jews as part of the genetic landscape of the Middle East. The American Journal of Human Genetics 69:1095-1112.
Qamar et al. (2002) Y-chromosomal DNA variation in Pakistan. American Journal of Human Genetics 70:1107-1124.
Ramana et al. (2001) Y-chromosome SNP haplotypes suggest evidence of gene flow among caste, tribe, and the migrant Siddi populations of Andhra Pradesh, South India. European Journal of Human Genetics 9:695-700.
Regueiro et al. (2006) Iran: Tricontinental nexus for Y-chromosome driven migration. Human Heredity 61:132-143.
Sahoo et al. (2006) A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios. Proceedings of the National Academy of Science 103:843-848.
Semino et al. (2000) The genetic legacy of Paleolithic Homo sapiens in extant Europeans: a Y-chromosome perspective. Science 290:1155-1159.
Sengupta et al. (2006) Polarity and temporality of high-resolution Y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of central Asian pastoralists. American Journal of Human Genetics 78:202-221.
Shen et al. (2004) Reconstruction of patrilineages and matrilineages of Samaritans and other Israel populations from Y-chromosome and mitochondrial DNA sequence variation. Human Mutation 24:248-260.
Shlush et al. (2008) The Druze: A population genetic refugium of the Near East. PLOS
Spencer-Wells et al. (2001) The Eurasian Heartland: a continental perspective on Y-chromosome diversity. Proceedings of the National Academy of Science 98:10244-10249.
Underhill et al. (2001) The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Annals of Human Genetics 65:43-62.
Vallone et al. (2004) Y-SNP typing of U.S. African American and Caucasian samples using allele-specific hybridization and primer extension. Journal of Forensic Science 49:723-732.
Weale et al. (2001) Armenian Y-chromosomal haplotypes reveal strong regional structure within a single ethno-national group. Human Genetics 109:659-674.
Zalloua et al. (2008) Y-chromosomal diversity in Lebanon is structured by recent historical events. American Journal of Human Genetics 82:873-882.
Get Test »