search this blog

Wednesday, June 28, 2017

Iron Age nomads vs Bronze Age herders: Sarmatians and Yamnaya in qpGraph

If we are to take these qpGraph models fairly literally, and I don't see why not, since they're very tight fits overall, then the early Sarmatians from what is now Pokrovka, Russia, derived almost 80% of their ancestry from Yamnaya or a very closely related group, while the rest of their ancestry came from a source that was a ~50/50 mixture between Han-like East Asians and a population closely related to Neolithic and Chalcolithic farmers from what is now Iran.

This topology also tests for the same Iran Neolithic/Chalcolithic-related input in Yamnaya, and I think it's very important to note that the relevant admixture edges (D7 to D9) are 0%, which suggests that Yamnaya did not harbor this type of ancestry. I didn't bother testing for East Asian-related admixture in Yamnaya in the same way, because it never shows such signals in other analyses.

The clearly more complex ancestry of the Sarmatians is probably best explained by the fact that they belonged to a true nomadic warrior culture, and indeed one that managed to spread its influence across vast stretches of Eurasia. So these two Sarmatian individuals, both from Unterlander et al. 2017, may have had recent ancestors from as far afield as Central Asia and Siberia. On the other hand, Yamnaya was a semi-nomadic pastoralist population, and although also highly mobile and prone to long-distance expansions, probably not as mobile as the Sarmatians.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Monday, June 26, 2017

Matters of geography

The steppe north of the Black Sea in Ukraine has basically always been considered part of Europe, and just over 100 years ago some guy with a map decided that the steppe between the eastern coast of the Black Sea in Russia and the Ural River in western Kazakhstan should also be Europe.

So nowadays, right or wrong, it's generally accepted that the entire steppe region west of the Ural River, known as the Pontic-Caspian steppe, is in Eastern Europe. Here's a map courtesy of Wikipedia showing how the official boundary between Eastern Europe and Asia has shifted since the 18th century.

But this decision wasn't entirely arbitrary, because the current boundary between Eastern Europe and Asia by and large follows several major geographic barriers, including the Caucasus Mountains, the Caspian Sea and the Ural Mountains. It'd be hard to argue that these barriers haven't had a profound impact across the ages on the character of Europe and its people, and this has probably been known for well over a couple hundred years.

For instance, if we're to trust the most common interpretations of the works of ancient geographers like Hecataeus and Herodotus, then their worlds in some important ways resembled the typical Principal Component Analysis (PCA) of West Eurasian genetic variation. And it seems that they had a pretty good idea where both the strong continental boundaries and fuzzy areas were located.

Below, on the geographic map inspired by Herodotus, Europa or Europe is delineated from much of Asia by the Black Sea, the Caucasus Mountains and the Caspian Sea, while on the genetic map, most European and Asian populations form two, more or less parallel, clusters fairly cleanly separated by empty space (this was first noted in Lazaridis et al. 2013). Indeed, this empty space is the work of the Black Sea, the Caucasus Mountains and the Caspian Sea acting as rather effective barriers to gene flow between Eastern Europe and Asia (see Yunusbayev et al. 2012).

However, on the genetic map, the Iranic Scythians of the Asian steppes straddle my somewhat arbitrary red line separating Europa and Asia, and this is echoed on the Herodotus map by Iranic and related peoples like the Massagetae and Issedones, who inhabit the seemingly undefined part of the world between Europa and Asia east of the Caspian Sea (Mare Caspium).

Nothing really ground breaking, but pretty cool stuff.

On a related note, I've seen the term "mainland Europe" used recently in at least one of the big ancient DNA papers to describe the part of Europe west of the Pontic-Caspian steppe. It seems that the authors wanted to underline the fairly stark genetic difference that existed between most of Europe and the steppe just prior to the expansion of Yamnaya and related steppe herder groups that initiated the formation of the present-day European gene pool.

I can see why they did this, but to my mind they got things backwards. That's because the term mainland implies the opposite of island and/or peninsula, and of course the part of Europe west of the Pontic-Caspian steppe is a relatively narrow strip of land surrounded by water, so it's a peninsula. Let's visualize these two models on a map of Europe courtesy of Wikipedia:

I understand that my model might result in heart palpitations for some readers, especially those from Western Europe, who generally see their part of Europe as core Europe, but I feel that it makes good sense from a purely geographic POV.

Monday, June 19, 2017

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

It's now more than obvious that South Asia experienced an almighty pulse of admixture from an Early Bronze Age (EBA) population originally from somewhere on the Pontic-Caspian steppe in Eastern Europe. This is fairly easy to demonstrate thanks to ancient DNA from Europe and West Asia. One way of doing it is with the qpGraph algorithm.

Moreover, the widespread presence of Y-chromosome haplogroup R1a in South Asia is, at least in large part, linked to this event, because:

- Mesolithic Eastern European foragers belonging to basal clades of R1a do not show any South Asian or even Near Eastern ancestry, so it's likely that R1a is native to Eastern Europe and surrounds

- If R1a is native to Eastern Europe then it can't also be native to South Asia, which is not only thousands of miles away, but also ecologically a different world

- The most common R1a subclades in the world today, R1a-M417 and one of its main daughter branches R1a-Z93, appear in Late Neolithic and Bronze Age European pastoralist groups (Corded Ware, Srubnaya and closely related peoples) that harbor high levels of Eastern European forager ancestry and no signs of South Asian admixture

- Practically 100% of the R1a in South Asia today belongs to the R1a-Z93 subclade, which, based on full Y-chromosome sequencing data, looks like it began expanding rapidly only during the EBA, eventually making its way to South Asia, and this is in line with the available ancient DNA evidence

- In South Asia, R1a and ancient steppe admixture peak in groups that speak Indo-European, including Indo-Aryan, languages, suggesting that both are genetic signals of the Indo-European expansions into the Indian subcontinent

So we're now at a stage where anyone with at least moderate thinking capacity, whose mind isn't poisoned by extreme bias, has to agree that there was a rather large movement of people from the Eurasian steppes into South Asia during the Bronze Age. No ifs or buts.

Ancient DNA from South Asia is on the way. It might throw up a few surprises and force a new model of how the Indo-Europeans and R1a got to South Asia, but it won't turn things upside down. In other words, don't expect the Out-of-India or "indigenous Aryans" theory to suddenly come into the picture as a viable alternative to the Aryan Invasion Theory (AIT), occasionally presented as the more politically correct Aryan Migration Theory (AMT).

Many Indians still don't get this, or rather they refuse to get it, which is very frustrating, especially if you're a regular in the comments section here. But admittedly it can also be very entertaining.

Last week The Hindu published an interesting piece on the latest developments in South Asian population genetics that were making the AIT, or at least AMT, look like a sure thing:

How genetics is settling the Aryan migration debate

Soon after came this peculiarly titled retort in the Swarajya online magazine, in which unfortunately it's impossible to find a single coherent argument:

Genetics Might Be Settling The Aryan Migration Debate, But Not How Left-Liberals Believe

Generally hilarious stuff, except the parts where the author abuses blogger Razib Khan for moving with the latest genetic data and arguing in favor of the Aryan expansion into India (see here and here).

So what are we to expect when the first big paper with ancient DNA from South Asia comes out, probably in the next few months? For starters, accusations of racism and maybe even hate speech against anyone who claims that the results support the AIT or AMT, or anything even close. And lots of shouting and carrying on. But also a lot more comic relief.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, June 16, 2017

Cypriot Y-chromosomes (Heraclides et al. 2017)

Over at PLoS ONE at this link. Note the fairly high levels of Y-haplogroups R1a and/or R1b in many of the Greek and Turkish populations in the figure below. Much of this might be of fairly recent European (mostly Slavic) and Central Asian (Turkic nomad and Ottoman) provenance, but I'd say some of it has to date back to the Bronze Age, and potentially to the expansions of the Proto-Anatolians, Proto-Armenians and Proto-Greeks into the Balkans and Anatolia from the Pontic-Caspian steppe. Emphasis is mine:

Abstract: Genetics can provide invaluable information on the ancestry of the current inhabitants of Cyprus. A Y-chromosome analysis was performed to (i) determine paternal ancestry among the Greek Cypriot (GCy) community in the context of the Central and Eastern Mediterranean and the Near East; and (ii) identify genetic similarities and differences between Greek Cypriots (GCy) and Turkish Cypriots (TCy). Our haplotype-based analysis has revealed that GCy and TCy patrilineages derive primarily from a single gene pool and show very close genetic affinity (low genetic differentiation) to Calabrian Italian and Lebanese patrilineages. In terms of more recent (past millennium) ancestry, as indicated by Y-haplotype sharing, GCy and TCy share much more haplotypes between them than with any surrounding population (7–8% of total haplotypes shared), while TCy also share around 3% of haplotypes with mainland Turks, and to a lesser extent with North Africans. In terms of Y-haplogroup frequencies, again GCy and TCy show very similar distributions, with the predominant haplogroups in both being J2a-M410, E-M78, and G2-P287. Overall, GCy also have a similar Y-haplogroup distribution to non-Turkic Anatolian and Southwest Caucasian populations, as well as Cretan Greeks. TCy show a slight shift towards Turkish populations, due to the presence of Eastern Eurasian (some of which of possible Ottoman origin) Y-haplogroups. Overall, the Y-chromosome analysis performed, using both Y-STR haplotype and binary Y-haplogroup data puts Cypriot in the middle of a genetic continuum stretching from the Levant to Southeast Europe and reveals that despite some differences in haplotype sharing and haplogroup structure, Greek Cypriots and Turkish Cypriots share primarily a common pre-Ottoman paternal ancestry.


Y-haplogroup frequencies within GCy and TCy can be found in S6 Table. Y-haplogroup frequencies of Cypriots, Greeks, and Turks, as well as other surrounding populations can be found in Fig 1 (as well as S7 Table). GCy and TCy showed very similar frequencies for the major Y-haplogroups, differentiating both from Greek and Turkish sub-populations (Fig 3). The most frequent major Y-haplogroup subclade in both GCy and TCy was J2a-M410 (23.8% and 20.3% among GCy and TCy, respectively), followed by E-M78 (12.8% Vs 13.9%) and G2-P287 (12.5% Vs13.7%). R1b-M343 was found in higher frequency among GCy (11.9%) than TCy (6.8%), while the same applies for E-M123 (13.1% Vs 6.3%). Finally, haplogroup, although in much lower frequencies than the aforementioned haplogroups, haplogroup I2 was somewhat higher among TCy (6.8%), than among GCy (2.3%), while haplogroup J2b was higher among GCy (5.8%) than TCy (1.8%). Other, less common haplogroups (i.e. I1, R1a, L, and T) showed similar frequencies (in the range of 1–5%) between GCy and TCy.

One additional difference between GCy and TCy was the presence of moderate numbers of East Eurasian (primarily Central Asian) Y-haplogroups and small numbers of North African Y-haplogroups among TCy but not among GCy. The frequency of East Eurasian haplogroups among TCy was C-M130 (0.5%), H-L901 (0.3%), N-M231 (2.4%), O-M175 (0.8%) and Q-M242 (1.3%), reaching a total of 5.6%, but only totalling 0.6% among GCy. North African haplogroups (E-M81, E-V38) were only found among TCy (2.1%) (S6 and S7 Figs).

A major feature differentiating Cypriots from Greeks, is the much lower frequency of haplogroups I (2.9% GCy, 7.3% TCy, ~10–21% mainland Greeks) and R1a (2.9% GCy, 3.2% TCy, ~10–22% mainland Greeks) among the former. All differences in haplogroup frequencies between populations were statistically significant (Fisher’s Exact test, p<0.001).


In terms of Y-haplogroup distribution, Cypriots (GCy and TCy) show substantial differences from Greeks, characterized by much lower frequency of haplogroups I2, R1a, and R1b in the former. These haplogroup differences indicate differential migrations into Cyprus and mainland Greece, at different points in history and prehistory. I2 is considered the major haplogroup among Mesolithic European Hunter-Gatherers[60], who apparently were either absent from Cyprus or were totally diluted (nearly extinguished) by subsequent migrations. Although the exact origins and migratory patterns of R1a and R1b are still under rigorous investigation, it seems that they are linked to Bronze Age migrations from the Western Eurasian Steppe and Eastern Europe into Southern (including Greece) and Western Europe[61]. Apparently, such migrations (especially as regards R1a) into Cyprus were limited.

Additionally, the Greek population has received considerable migrations during the Byzantine era and the Middle Ages from other Balkanic populations, such as Slavs[62,63], Aromanians (Vlachs)[64], and Albanians (Arvanites)[65,66]. The former, is very likely to have increased R1a frequencies among Greeks. In fact, Fig 3 (also S7 Table) indicate that R1a increases gradually with increasing latitude in Greece. There is no historical evidence for such migrations into Cyprus during the same period.

Heraclides A, Bashiardes E, Fernández-Domínguez E, Bertoncini S, Chimonas M, Christofi V, et al. (2017) Y-chromosomal analysis of Greek Cypriots reveals a primarily common pre-Ottoman paternal ancestry with Turkish Cypriots. PLoS ONE 12(6): e0179474.

Tuesday, June 13, 2017

qpGraph models for the Kalash & Yamnaya

I'm pretty happy with this effort, but it's a very complex topology with a lot of admixture edges. Moreover, its highest Z score of nearly 3 suggests that it can be improved (Z >3 would mean a failed model). Indeed, I'd say that the Basal Eurasian admixture coefficients are a little too high, and perhaps Steppe_EBA is a few per cent more West Asian/Caucasian than it should be. More details about all of the graphs in this post are available here.

Obviously, the labels for the inferred ancestral populations, like North Caucasian, are speculative. In hindsight, it may have been better to use something like single letter labels.

But now that I have a fairly robust topology, I can try and ask some questions. For instance, is the inferred Caspian pop a better source of West Asian ancestry in Yamnaya than the so called North Caucasian one? The answer is probably no.

My main graph is also a decent statistical fit for at least a number Indian groups, like, for instance one of the Gujarati subpopulations labeled GujaratiD in the Human Origins dataset. But it fails marginally for Pathans, so it's not a robust solution for all of South Asia. Incredibly, using Andronovo instead of Yamnaya in the Pathan model makes it work. Tajiks can also be modeled in this way using Andronovo. I say incredibly, because Pathans and Tajiks are obviously Iranic speakers, and their Iranic ancestors in all likelihood arrived in South Asia from the Eurasian steppe much later than the Indo-Aryan ancestors of the Kalash and most Indians.

So what we might be seeing here is substructure within the steppe-related admixture amongst South Asians, with Indo-Aryan speakers apparently showing Yamnaya-related (Catacomb?) ancestry, and Iranic speakers, as well as possibly groups with significant Iranic ancestry, showing a preference for later Andronovo-related ancestry. I need to have a closer look at this. But it won't happen overnight; my brain is fried as it is after this effort, and I need to get some fresh air.

Update 14/06/2017: I've now had the chance to test many more Indo-Aryan and Iranic groups with my model. Most of these groups show a slight, non-significant, preference for Yamnaya_Samara as the steppe reference population. However, those that show a slight, and again non-significant, preference for Andronovo are usually Iranic, such as the Balochi in the graphs below. I'm not claiming that this proves anything, but I do think that it hints at something, and I'll try testing a few different hypotheses in the near future with qpGraph.

See also...

qpGraph open thread

Thursday, June 8, 2017

qpGraph open thread

I managed to put together a simple qpGraph model for the Kalash using present-day populations. It's largely based on the model for the Paniya by Nakatsuka et al. (see Supplementary Figure 5. here). The graph and pops files for my model can be downloaded here and here, respectively. I'm now working on a more complex model for the Kalash that includes ancient genomes from Eastern Europe and West Asia.

I'm willing take a few requests for qpGraph models in the comments below. Please note, however, that these requests will have to be accompanied by graph and pops files, and the graph files must be correctly set out; if they don't work, then they don't work, and you won't get your graph. On the other hand, you only need to supply pops files with the correct populations and I'll do the rest.

See also...

qpGraph models for the Kalash & Yamnaya

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Wednesday, June 7, 2017

The pigtailed figures

Reconstructed Proto-Indo-European (PIE) vocabulary suggests that the speakers of PIE, who probably lived on the Pontic-Caspian steppe during the Eneolithic, were familiar with wool. Interestingly, ancient DNA suggests that Near Eastern-related ancestry first appeared on the Pontic-Caspian steppe during the Eneolithic, because Neolithic samples from the Pontic steppe in what is now Ukraine lack this type of admixture. Perhaps it first arrived there with women from south of the Caucasus who knew how to spin wool? Below are a couple of interesting quotes from Becker et al. 2016. Emphasis is mine:

For ancient Mesopotamia McCorriston has proposed a fundamental shift from linen-based to woollen textile production. [4] Drawing on evidence from cuneiform texts as well as faunal and botanical remains, she suggests that it was in the 3rd or perhaps late 4th millennium BCE that wool became the fibre of choice for everyday use. Recent archaeological and archaeozoological research, however, suggests a considerably earlier date, before the advent of writing. Written sources from the mid- to late 3rd millennium BCE demonstrate that sheep and goats were maintained in herds of some dozens to a few hundred and herded in large flocks up to several thousand animals. In fact, cuneiform records provide ample evidence for the usage of wool in textile manufacture, whereas linen appears only rarely. The growth of a large-scale woollen textile industry rested on women as the main source of labour.


During the Late Uruk and Jemdat Nasr periods in Mesopotamia, scenes appear on cylinder seals that have been interpreted as showing textile production carried out by so-called pigtailed figures. [93] A specific raw material cannot be deduced from these depictions, but the substantial number of scenes indicates a significant concern with cloth manufacture.

Becker et al., The Textile Revolution. Research into the Origin and Spread of Wool Production between the Near East and Central Europe, eTopoi, Special Volume (6) 2016, (ISSN 2192-2608)

See also...

A plausible model for the formation of the Yamnaya genotype

A homeland, but not the homeland

Monday, June 5, 2017

Ancient human genomes from Southern Africa (Schlebusch et al. 2017 preprint)

Over at bioRxiv at this LINK. Emphasis is mine:

Abstract: Southern Africa is consistently placed as one of the potential regions for the evolution of Homo sapiens. To examine the region's human prehistory prior to the arrival of migrants from East and West Africa or Eurasia in the last 1,700 years, we generated and analyzed genome sequence data from seven ancient individuals from KwaZulu-Natal, South Africa. Three Stone Age hunter-gatherers date to ~2,000 years ago, and we show that they were related to current-day southern San groups such as the Karretjie People. Four Iron Age farmers (300-500 years old) have genetic signatures similar to present day Bantu-speakers. The genome sequence (13x coverage) of a juvenile boy from Ballito Bay, who lived ~2,000 years ago, demonstrates that southern African Stone Age hunter-gatherers were not impacted by recent admixture; however, we estimate that all modern-day Khoekhoe and San groups have been influenced by 9-22% genetic admixture from East African/Eurasian pastoralist groups arriving >1,000 years ago, including the Ju|'hoansi San, previously thought to have very low levels of admixture. Using traditional and new approaches, we estimate the population divergence time between the Ballito Bay boy and other groups to beyond 260,000 years ago. These estimates dramatically increases the deepest divergence amongst modern humans, coincide with the onset of the Middle Stone Age in sub-Saharan Africa, and coincide with anatomical developments of archaic humans into modern humans as represented in the local fossil record. Cumulatively, cross-disciplinary records increasingly point to southern Africa as a potential (not necessarily exclusive) 'hot spot' for the evolution of our species.

Schlebusch et al., Ancient genomes from southern Africa pushes modern human divergence beyond 260,000 years ago, bioRxiv, Posted June 5, 2017, doi:

Friday, June 2, 2017

The healthy Kurgan pastoralist

Just in at bioRxiv, a new preprint on the genomic health of ancient hominins, at this LINK. Obviously, if it's true that the Yamnaya and other closely related Kurgan culture pastoralists of the ancient Eurasian steppe had unusually healthy genomes, then it becomes easier to understand why they made such a massive impact on the ancestry of present-day Europeans and Central and South Asians, because populations that enjoy good health are likely to grow faster than those that don't. From the preprint, emphasis is mine:

Abstract: The genomes of ancient humans, Neandertals, and Denisovans contain many alleles that influence disease risks. Using genotypes at 3180 disease-associated loci, we estimated the disease burden of 147 ancient genomes. After correcting for missing data, genetic risk scores were generated for nine disease categories and the set of all combined diseases. These genetic risk scores were used to examine the effects of different types of subsistence, geography, and sample age on the number of risk alleles in each ancient genome. On a broad scale, hereditary disease risks are similar for ancient hominins and modern-day humans, and the GRS percentiles of ancient individuals span the full range of what is observed in present day individuals. In addition, there is evidence that ancient pastoralists may have had healthier genomes than hunter-gatherers and agriculturalists. We also observed a temporal trend whereby genomes from the recent past are more likely to be healthier than genomes from the deep past. This calls into question the idea that modern lifestyles have caused genetic load to increase over time. Focusing on individual genomes, we find that the overall genomic health of the Altai Neandertal is worse than 97% of present day humans and that Otzi the Tyrolean Iceman had a genetic predisposition to gastrointestinal and cardiovascular diseases. As demonstrated by this work, ancient genomes afford us new opportunities to diagnose past human health, which has previously been limited by the quality and completeness of remains.


Both the allergy/autoimmune and gastrointestinal/liver disease categories (which share many of the same disease-associated loci) show significantly lower genetic risk in pastoralists than agriculturalists and hunter gatherers. Pastoralists also have significantly reduced risk for cancer compared to agriculturalists. Agriculturalists have a higher genetic risk for dental/periodontal diseases than hunter-gatherers and pastoralists. In general, pastoralists possess extremely healthy genomes, especially for cancers and immune-related, periodontal, and gastrointestinal diseases.


It is unclear why pastoralists would have the lowest risk in these specific disease categories. We caution that this pattern may be the result of technical issues, as pastoralists have the smallest sample size (only 19 individuals) and geographic range (between 40-90°E longitude and 45-55°N latitude, Figure 1B). Because populations that have different subsistence types also differ in other ways, the lower GRS of pastoral populations may be due to other factors, including demographic history.

Ali J. Berens, Taylor L. Cooper, Joseph Lachance, The Genomic Health Of Ancient Hominins, bioRxiv, Posted June 2, 2017, doi:

Wednesday, May 31, 2017

A homeland, but not the homeland

It seems increasingly likely that ancient DNA has identified a massive expansion, or a series of expansions, from Mesopotamia and/or surrounds in basically all directions dating to the Chalcolithic (ChL) and Bronze Age (BA). This phenomenon is mainly characterized by the simultaneous spread of:
- Iran_ChL-related genome-wide ancestry

- Y-haplogroup J

- South Caspian-specific mitochondrial haplogroups such as R2 and U7

At least two of these characteristics are shared by five groups that have appeared in the Near Eastern and African ancient DNA record as probable post-Neolithic newcomers, at least in part, at their respective sampling sites:

- Anatolia_BA, Western Turkey, 2836-1800 calBCE (Lazaridis et al. 2017)

- Egyptian mummies, Middle Egypt, 776-2 calBCE (Schuenemann et al. 2017)

- Iran_ChL, Western Iran, 4839-3796 calBCE (Lazaridis et al. 2016)

- Levant_BA, Northwestern Jordan, 2489-1966 calBCE (Lazaridis et al. 2016)

- Sidon_BA, Southern Lebanon, 1750-1600 BCE (Haber et al. 2017)

I'm confident that many more such groups will soon be added to the ancient DNA record, probably including Levant_ChL from the upcoming Harney et al. 2017 (a teaser of the paper can be seen here). Below, a map of Mesopotamia courtesy of Wikipedia.

It's an interesting and important question who these likely Mesopotamian migrants and their descendants were in terms of linguistic affinities. It seems that they left a massive genetic imprint on the Near East and much of North Africa, and perhaps also Central Asia and Southeastern Europe, so they probably also left some sort of linguistic legacy.

Obviously, it's highly improbable that most of them were Indo-European speakers. So if most of them weren't Indo-Europeans, then the phenomenon I'm describing here can't be related to the Proto-Indo-European (PIE) expansion. Forget the idea of an West Asian linguistic hot spot spewing out different, distantly related language families, including Indo-European, via the migrations of closely related Iran_ChL-like populations over a span of a few thousand years; it's plain stupid.

So who were they?

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...