Thursday, September 24, 2015

Support for linguistic macrofamilies from weighted sequence alignment

Open access at PNAS:

Abstract: Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily.

Gerhard Jäger, Support for linguistic macrofamilies from weighted sequence alignment, PNAS, Published online before print September 24, 2015, doi: 10.1073/pnas.1500331112


Nirjhar007 said...

Thank you for posting the research.

Nirjhar007 said...

Its again with no surprises, they have neglected the stark Near Eastern-IE linguistic relations.

andrew said...

Not feeling the love. There are too many subtle issues of methodology that could confound the analysis.

Kurti said...

The analysis isn't professional. Obviously the languages which quite frankly should have been influenced very recently by Indo European languages (Altaic/Turic and Uralic) come first. In those kind of analyses you need to have a methodology were you compare the root of words, of which you know they are not loan.

Nirjhar007 said...

A friend points to me that the IE-Kamchatkan connection is done with Celtic non-IE words.
Moreover,On Chukotko-Kamchatkan + Indo-European: Even proponents of Eurasiatic do not consider Chukotko-Kamchatkan as Indo-European’s closest relative. So, from the point of view of Eurasiatic/Nostraticist scholarship, the status of this clade is doubtful.
They are very uncertain on this point, but it's strange that they simply use modern words without minding etymology!
Kurti, Sumerian show good amount of IE influence and of Course Hurrian and Caucasian groups also have a case.
IMO PIE developed South Of Caspian /N Iran as you know and it is not surprising to observe such relations, maybe that can be also suggested with the origin of the Teal component around that same area...

Ebizur said...

Kurti wrote,

"Obviously the languages which quite frankly should have been influenced very recently by Indo European languages (Altaic/Turic and Uralic) come first."

I understand your concerns about the validity of this analysis, but how should Koryak-Chukchi (Chukotkan), Itelmen (Kamchatkan), Nivkh, or Yukaghir languages have been influenced very recently by Indo-European languages? Imperial Russian influence? Such recent influence would surely be easily excluded in any analysis of this sort. Even historical influence of the Chinese language on the Japanese language, with most of the actual influence dating back to the latter half of the first millennium CE, is obvious to anyone with even a basic understanding of the language.

Karl_K said...

"A friend points to me that the IE-Kamchatkan connection is done with Celtic non-IE words."

You should read the paper yourself. The author discusses this. Even leaving out those Celtic languages made little difference to the results.

"excluding them does not change the topology of the tree and only mildly affects confidence values. The confidence value rises from 0.967 to 0.981 for IndoEuropean, and it falls from 0.969 to 0.964 for the Indo-European/ Chukotko-Kamchatkan clade."

Nirjhar007 said...

Nonsense, its just Hilarious '' Indo-European/ Chukotko-Kamchatkan clade'', just throw it in the trash...

Karl_K said...

That is your opinion. I think it makes perfect sense.

Nirjhar007 said...

Any person with some knowledge in Linguistics, can tell you, that its complete wacko , at best some distant relationship maybe established, yes.

tew said...

Looking at the author's curriculum, it is clear he had never worked with historical linguistics before this paper; in addition to that, the methodology in general is rejected by the majority of historical linguists, who are (should be?) the real arbiters in this sort of thing.

So we have here, once more, a non-specialist publishing in a non-specialist medium a paper purporting to "revolutionize" historical linguistics using methods disavowed by most experts in the field.

Seriously, do you need any more red flags?

I know that in many places historical linguistics has been largely defunded and the number of experts has dwindled, but that doesn't justify that everybody and their uncle decide to publish and pontificate on the subject, let alone that this exercise should be taken seriously. When will people from outside the area realize the obvious?

Simon_W said...

But the majority opinion of linguists is that there is no certain, established genetic relationship between IE and any other language family. What similarities there are may be explained as areal effects and old loans. So maybe older, higher order relationships simply can no longer be established with the accepted means of historical linguistics. In that sense new approaches might detect something real? Though if the data they use is garbage, the output will be garbage too...

Simon_W said...

At least the proposed IE-Chukotko-Kamchatkan clade reminded me of this MDLP map:

Kristiina said...

That phylogenetic tree also correlates with shared drift with Ma1, i.e. ANE ancestry (below in descending order excluding Amerinds):
West Greenland, East Greenland
Western Finns
Tundra Nenets

Figure SI 21:

tew said...

Sure, even an uninformed hunch (and this is ofc more than that) could "detect something real" that might be confirmed later. The problem is, how do you demonstrate that to the satisfaction of experts?

The traditional methodology has its flaws and limitations, but it is clear that there is nothing better available - yet. At its worst, megalocomparison in all its incarnations has been shown to generate various sorts of absurd results and spurious genetic relations (because only certain kinds linguistic structures are possible and thus over millennia totally random similarities even of the eeriest kind can and will appear); at best, it merely reproduces what is already known about areal influence.

Now, under the current methods, if deeper relations are to be found, first we would have to agree on the basic outline of a reconstruction of PIE and Pre-PIE. Close, but not there yet. Then we would need to do the same for the other language families being compared. The problem is most of those are understudied from a philological point of view, with a shortage of experts on their history, and there is a lot of uncertainty about reconstruction. So, yes, the data available, while not garbage, is mostly poor.

My point is: yes, historical linguistics is an area that generates a lot of curiosity from the public and people from other areas like genetics, but one that nowadays has relatively few people on the ground (or on the desk) doing specialist work (and a disproportionate amount of those who do, dedicate themselves to IE and not so much to other languages). So, in order to satisfy the existing demand, "impatient" outsiders try to fill in the gaps using incomplete data and unconventional methods, and fail. Unfortunately, they are also taken way too seriously. It may be a PR problem after all.

a said...

Quick rhetorical question, how much teal did R1a & R1b H.G. have?
just where do the two of these samples cluster in the grand scheme of things?

German Dziebel said...

Another brilliant but hopeless study! I can already see it circling around the drain next to Grey and Atkinson, Greenberg and Ruhlen and other valiant attempts to make long-range linguistics look as scientific as evolutionary biology. Nichols's Linguistic Diversity in Space and Time remains unsurpassed precisely because it's not trying to argue for macrophyla. I did like the Indo-European-Chukotko-Kamchatkan connection (ANE, ANE, ANE...) but then the author bailed on it himself. Kortlandt's Core Eurasiatic also looks intriguing (ANE, ANE, ANE...) but then Ket is rich in ANE but it's on the opposite side of the tree from it.

Kristiina said...

German, but don't we say that exception proves the rule.

It was recently claimed that Yeniseian languages came from Beringia (America)

Maybe the oldest ANE languages are in America and the language families such as Uralic and Indo-European only recently spread on top of the old paleo-Siberian layer with recent innovations from the south.

Coldmountains said...

This study is not really convincing and I personally doubt this clades are correct. It is obvious that Indo-European is a North Eurasian language and if it is related to any other language families , it can only be Uralic, Yenisseian and other North Eurasian languages but this is impossible to prove or disprove. It is also possible that Indo-European is the descendant of an ancient North Eurasian language having no direct genetic connection to any North Eurasian language which is still spoken today. The "EHG/ANE" language of the ancestors of Indo-Europeans had probably many distant counsins in Siberia but most of the ANE folks there got replaced or absorbed by populations from the southwest which were predominantly of ENA origin. The similarities between Uralic languages and Indo-European languages are quite fascinating and some kind of Sprachbund between Proto-Uralics and Proto-IEs is quite plausible.

andrew said...

One important weakness of the paper is the narrow amount of data relied on for each language. Just 40 words from a Swadesh list each.

Since all sorts of methods can make the easy connections (languages with clear relations in conventional language families), it would be better to pick one better set of data for each language family that is most basal, and then to compare those smaller sets of languages with better quality of data each.

Kurti said...


Obviously they were, it doesn't always needs a direct connection but Yukaghir just as example live side by side with Turkic speakers. There are also traces of Scythian settlements as far as East Siberia. Also as you pointed out yourself. Alone the fact that Russian is the official (administrative) language of this country. Every non Indo European language in Russia should and will show a decent amount of Indo European loans

As other members pointed out. going just by etmytology of pure words in a language. Uralic and Kartvelian should come before Turkic/Altaic from what I have seen and heard from various linguistic sources. Therefore I doubt that the methodology was very professional.

However I am not doubting a close relationship between Indo European and Altaic at all. Just that I think that, let's call it "Proto Caucasic" and Uralic should come first and than Dravidians/Altaic.

Kurti said...

" Every non Indo European language in Russia should and will show a decent amount of Indo European loans"

Not only there also in almost all of Central and much of Northeast Asia too.

Kurti said...

Tew said

" (because only certain kinds linguistic structures are possible and thus over millennia totally random similarities even of the eeriest kind can and will appear); at best, it merely reproduces what is already known about areal influence."

^This, this is why you need to compare the root of words because modern words can often been loaned or even simply influenced by other languages. For example just because the number 7 starts in language A with H and ends with K and the same in the language B, doesn't mean they have to be from the same root.

For example just because "Bruder" in German for "Brother" sounds and looks more similar to Persian "Berader" than Kurdish "Bra/t" doesn't mean this Persian word is etymologically closer to German than Kurdish. It's simple coincidence in loudshifts.

postneo said...

wheres semitic? bantu?

capra internetensis said...


They only used Eurasian languages (time and computer power being finite).

Simon_W said...

Having read the paper now, I have to point out that the author himself admits that his method cannot distinguish between genetic relationships and abundant borrowing with an otherwise isolated family. Thus, he admits that the detected relationship between Sino-Tibetan and Hmong-Mien may be due to extensive borrowing from Sino-Tibetan into Hmong-Mien at an early stage. Or the relationship between Ainu and Japanese. He admits that more sophisticated computational models are needed which also automatically detect cognates in order to detect true genetic families.

However, this paper is still interesting, as it groups Indo-European with Uralic-Yukaghir and two Paleo-Siberian isolates in a clade with 99.9% confidence. Whereas for instance Nakh-Daghestanian is an utter outgroup. Kartvelian and Abkhaz-Adyghe weren't considered because they showed too many inconsistent, conflicting signals. Sumerian wasn't considered because only languages spoken in the recent time were considered. And Afro-Asiatic wasn't considered because it's predominantly African. So, unfortunately a whole lot of languages that are often referred to as showing relationships with PIE were not considered. But, if Pre-Proto-IE was a language of North Eurasian hunter-gatherers and PIE expanded from somewhere close to the Caucasus and the ancient Near East, we may even expect clearer, younger signals of contact with languages of the Caucasus and the ancient Near East, although these would be more punctual and less comprehensive than the more diffuse signals of the PPIE past.

German Dziebel said...


"Maybe the oldest ANE languages are in America.."

I have no doubts about it. But it looks like ANE's linguistic descendants span the whole spectrum of Eurasian languages from Indo-European to Yeniseian (if we look at the phylogeny in the current paper). So it's not a recent event. As Raghavan et al. (2013) wrote, if there was a migration out of America carrying MA-1 related lineages, it must have happened prior to 24,000 YBP.

Karl_K said...


So it looks like all ANE languages came out of the Americas before 24,000 years ago?

So would you say that the exclusively East Asian related part of the Native Americans genetics is a later introduction into the Americas, or there was unequal distribution of genetics when Americans first arrived in Eurasia?

Why didn't the rest of the Out Of America migration keep the ANE languages. Were there multiple American types that later merged within the Americas into one?

German Dziebel said...


I don't think the "east Asian" part of Native American genetics is literally east Asian by origin. I think it's "northern North American" that migrated to East Asia as a separate wave from ANE. (It probably went down the coast because MA-1 in the Siberian inland has no affinity to East Asians, while Tianyuan in southern China shares more genetic material with Amerindians followed by East Asians and is pretty far from West Eurasians). Amerindians is a highly structured and diverse population (world highest Fst, world-highest levels of linguistic diversity), hence we find in the New World alleles that are shared with such diverse Old World continental groups as West Eurasians, East Asians and Papuans but they are not shared between each other by those Old World continental groups. Amerindians tend to be closer to Neandertals and Denisovans than modern East Asians and modern West Eurasians suggesting, again, that Amerindians are "older" than East Asians and West Eurasians.

Yes, it is possible that there was a mixture of northern and southern Amerindians within the Americas (hence, Fst is higher in the New World than among, say, Neandertals and Denisovans who were even less admixed) and it's this process that's being misinterpreted as the mixture of East Asians and West Eurasians thought to have taken place in Siberia prior to the peopling of the Americas.

Ryan said...

David posted this a while ago, but it's worth posting again:

That's the spread of microblade technology from around Lake Baikal.

I wouldn't take the link to Chukotko-Kamchatkan as a link broadly Amerind groups, as the Eskimo-Aleut language family seems to have expended relatively recently, and given the high levels of East Asian ancestry, I wouldn't be surprised if the language has deeper origins in East Asia even if the people are more genetically mixed. The Inuit are not very representative of indigenous people in the Americas as a whole.

Yenisien is a complete outgroup too...

terryt said...

In spite of doubts expressed by some I find the East Asian set fits what I have long accepted to be the case. The earliest Y-DNA O in SE Asia was O2a, carrying the Austro-Asiatic language family. Interestingly the diagram shows a relationship between Austroasiatic and both Japonic and Ainu. Perhaps these last two groups represent O2b-connected language families. It seems fairly obvious that the earlier Austroasiatic language family has been overlaid in SE Asia by a later Austronesian and Tai-Kadai expansion. These two are shown as related language groups, supporting my position in a discussion elsewhere on the subject. My guess is that Y-DNA O1 led the early expansion of both these latter language families, although on the mainland older Y-DNA O2 populations adopted the Tai-Kadai branch. And the Y-DNA outpaced the language.

I have also long held the view that Hmong-Mien and Sino-Tibetan are related, as shown in the diagram. The two language families do appear to have related Y-DNA haplotypes: Hmong-Mien being O3a2b-M7 and Sino-Tibetan being O3a2c-P164.

Ebizur said...


Although this study's methodology does readily recover traditionally supported language families, it seems that any clade in its trees that is not supported by traditional
historical linguistics should be ignored unless the bootstrap value is 100%. (For example, a Korean-Burushaski clade was produced in S1 with 99.5% bootstrap support, but I have never encountered even a preliminary proposal of a relationship between Korean and Burushaski. There does seem to exist some small degree of genetic relationship between those two ethnic groups, but it is so minor that I doubt it should have resulted in anything more than some limited lexical borrowing.)

Jager et al. 2015, Supp. 1: full dataset, full tree
Austroasiatic (with 100% bootstrap support, and the Pakanic languages of eastern Yunnan and western Guangxi being the most divergent members) is positioned as a sister taxon of Hmong-Mien (93.3% bootstrap support) within a {(Sinitic + TB) + (HM + Austroasiatic)} clade (98.3% bootstrap support).

Sinitic consists of {Northern_Tujia + (Bai + Chinese)}, with the (Bai + Chinese) clade having 100% bootstrap support, and the {Tujia + (Bai + Chinese)} clade having only 67.3% bootstrap support. The (Sinitic + TB) clade, i.e. traditional Sino-Tibetan, is supported by a 99.5% bootstrap value.

In this tree, Austroasiatic is far removed from Japanese-Ainu, with this latter clade having 97.0% bootstrap support, and being positioned (with 92.3% bootstrap support) within a {(Japonic + Ainu) + (Shompen-Kusunda + North Caucasic)} clade.

Jager et al. 2015, Supp. 2: reduced dataset, full tree
Austroasiatic (with 100% bootstrap support, and the Mundaic languages of India being most divergent) is positioned as a sister taxon (96.8% bootstrap support) of Japanese-Ainu (93.1% bootstrap support). This {Austroasiatic + (Japonic + Ainu)} clade then groups (with 93.4% bootstrap support) with Dravidian and then (with 52.7% bootstrap support) with Caucasic.

Jager et al. 2015, Supp. 3: reduced dataset, reduced tree
This is the version of the tree that is depicted in the image in this blog entry.

An Austroasiatic clade is supported with a 100% bootstrap value, but it divides directly into four daughter taxa: Khasi-Mangic, Mundaic, BitKhang-Khmu-PalaungWaic, and Nicobar-Asli-MonKhmeric.
With 96.8% bootstrap support, this Austroasiatic clade is positioned alongside Ainu and Japonic in a tripartite (Austroasiatic + Ainu + Japonic) clade.
This Austroasiatic-Ainu-Japonic clade is coordinate with Yeniseian, Dravidian, Northeast Caucasian (Nakh-Daghestanian), Sino-Hmongic (i.e. {(Karen + Sino-Bai + TB) + (Hmong + Mien)}), Austronesian-Daic, and a new sort of macro-Altaic, with this last clade consisting of Tunguso-Mongolic, Turkic, and Sibero-Uralic (i.e. {Yukaghir + Nivkh + Uralic + (Chukotko-Kamchatkan + Indo-European)}. A Celtic clade is basal (96.7% bootstrap support) to the rest of Indo-European, which divides (98.6% bootstrap support) into Albanian, Romance, {Germanic + (Baltic + Slavic)}, and {Armenian + (Indic + Iranic)}.

terryt said...

Thanks for that extremely interesting analysis, Ebizur.

"I have never encountered even a preliminary proposal of a relationship between Korean and Burushaski".

That part of the tree certainly surprised me, and I find it difficult, but not impossible, to accept. Burushaski has always been regarded as a language isolate as far as I know.

"There does seem to exist some small degree of genetic relationship between those two ethnic groups, but it is so minor that I doubt it should have resulted in anything more than some limited lexical borrowing.)"

But we know that language and genetics are not all that closely related. Gene expansions are presumably usually associated with language expansions but languages are probably replaced much easier than are genes. Which is the other way round to the situation here.

"Austroasiatic (with 100% bootstrap support, and the Pakanic languages of eastern Yunnan and western Guangxi being the most divergent members) is positioned as a sister taxon of Hmong-Mien (93.3% bootstrap support) within a {(Sinitic + TB) + (HM + Austroasiatic)} clade (98.3% bootstrap support)".

That could perhaps suggest a very ancient common origin, or at least relationship, for all those groups. That would not be surprising. The remainder of your post also suggests much the same thing with some ancient separation followed by a variable amount of subsequent contact between the various language groups.

"This Austroasiatic-Ainu-Japonic clade is coordinate with Yeniseian, Dravidian, Northeast Caucasian (Nakh-Daghestanian), Sino-Hmongic (i.e. {(Karen + Sino-Bai + TB) + (Hmong + Mien)}), Austronesian-Daic, and a new sort of macro-Altaic, with this last clade consisting of Tunguso-Mongolic, Turkic, and Sibero-Uralic".

Now that is extremely interesting.

Ebizur said...

I looked over the Burushaski Swadesh list at, but I could find very little similarity with Korean. The finding of a close and exclusive relationship between these two languages is either a spurious artefact of this study's methods, or else the sound changes involved are not intuitive.

With my previous comment, I intended to intimate that this study's finding of relationships above the level of traditionally established language families should not be taken too seriously.

In the tree in S1 (the full tree derived from the full dataset), Austroasiatic is a sister taxon of Hmong-Mien within a [(Sinitic + TB) + (Austroasiatic + Hmong-Mien)] clade, while Japonic is a sister taxon of Ainu within a {[Japonic + Ainu] + [(Kusunda + Shompen) + (NW Caucasian + NE Caucasian)]} clade. In this tree, Dravidian is more closely related to Nihali, Basque, and "Nostratic" (Uralic, Nivkh, Yukaghir, Kartvelian, Altaic, and Chukotko-Kamchatkan+Indo-European) than it is related to the North Caucasus + Japan group or the Austroasiatic-Hmong + Sino-Tibetan group.

MULAO_KADAI, NA_KHE_GELAO, NAGA_MAO, NAGA_POCHURI, NAGA_SUMI, NAHALI, NENETS, NIHALI, NORTHERN_TUJIA, NUMAO_BUNU, PA-HNG, PALIU, PHUNOI, PUTIAN_CHINESE, QIANG_LONGXI, SELKUP, SHERDUKPEN, SHIMENKAN_HMONG, SHOM_PENG, SULUNG, SVAN, TARAON, UBYKH, UDI, YERONG, YUKAGHIR_TUNDRA, ZHABA), Austroasiatic has moved away from Hmong-Mien and Sino-Tibetan to become a sister taxon of (Japonic + Ainu) within a {Dravidian + [Austroasiatic + (Japonic + Ainu)]} clade, and this latter clade is a sister taxon to Caucasian (with only NE Caucasian being represented in this tree since all NW Caucasian and Kartvelian languages have been excluded as "rogue taxa"). The phylogenetic positions of Dravidian and Austroasiatic (not so much Japonic-Ainu) have been significantly affected by the exclusion of those so-called "rogue taxa" that are alleged to be internally inconsistent (because of borrowing from or random similarities with a taxon that belongs to another clade).

FrankN said...

@ebzur: You can find and access the data used in the study at
Here is what I could identify as possible Burushaki-Korean cognates:
#: Engl. - Burush. - Korean
11: one - hin - han
18: person - hir - saram
25: leaf - tap -iph~
28: skin - gap - gopjil
44: tongue - juNus - hy~o
47: knee - nuNus - murup [Bur. '-Nus' seems to be a.k. of suffix]
53: liver - kEn -kan
61: die - jur - cuk
75: water - cel - mul
82: fire - ph~u - ful
86: mountain - doN - san
100: name - ik - irum

Hmm - 3 clear correspondences (one, skin, liver), plus a couple of "maybe"s. I won't exclude that the algorithm has been lured by general similarities in word construction and sound inventory into grouping both together. OTOH, a quick Google search seems to indicate that so far nobody has ever looked systematically into a possible relation of both languages. However, some shared grammatical traits (e.g. tense marking and use) have been occasionally noted.
Let me put it like this: If Jaeger's study contributes to inspiring people to look for possible linguistic relation outside the "usual suspects", it has been productive.
Among the "wild" links showing up in the unfiltered Annex (S1), I especially found interesting and worth further investigation:
a. The Basque-Nihali link (Nihali is assumed to carry some 30% pre-IE, pre-Drav. or pre-Munda substrate)
b. Shom Peng (Nicobares) and Kusunda (Nepal) linked to North Caucasian and Ainu-Japanese - relicts of a paleolithic Persian Gulf refuge?
c. The Albanian-Goidelic branch as earliest split from IE - reminds me on the "Illyrian layer" that Krahe identified in his Old European Hydronomy, and a few open questions relating to pre-Celtic settlement of Ireland (including P. Shryver's hypothesis that Ireland only became celticised in the 4th century AD from Wales).

FrankN said...

P.S: Here is a far older (2009) version of the tree, with less sophisticated algorithms, but comprising all recorded languages, not just the Eurasian ones
If you scroll down around 3/4 of the document, you will find already there Burushaki clustering with Korean. Korean is joined by Bunak, a Trans-New Guinean language.

Ebizur said...


Thank you for the links.

"#: Engl. - Burush. - Korean
11: one - hin - han
18: person - hir - saram
25: leaf - tap -iph~
28: skin - gap - gopjil
44: tongue - juNus - hy~o
47: knee - nuNus - murup [Bur. '-Nus' seems to be a.k. of suffix]
53: liver - kEn -kan
61: die - jur - cuk
75: water - cel - mul
82: fire - ph~u - ful
86: mountain - doN - san
100: name - ik - irum"

Korean 한 (han) "one" descends from Middle Korean ᄒᆞᆫ (hɔn) and is used before nouns as an adjective/modifier. The substantive form is 하나 (hana) < Middle Korean ᄒᆞ낳 hɔnah. A comparison with Burushaski hin appears plausible, but also cf. Proto-Indo-European *Hoy-n "one" and Ainu sine "one" (? < *hi-ne "one +"). Note that Ainu si- < (? < *hi-) is also a reflexive prefix (on verbs) or a beautifying/honorific prefix (on nouns).

Korean 사람 saram "person, human being" descends from Middle Korean 사ᄅᆞᆷ sarɔm and appears superficially like an old compound of Middle Korean sar- "to live, to be alive" + Middle Korean nɔm "another person, other people." It probably originally contrasted with Korean 주검 jugeom "dead body, corpse": i.e. sarɔm "living person" vs. jugem "dead person."

Both Burushaski tap and Korean ip (< Middle Korean nip) are vaguely similar to other East Asian words for "leaf": Middle Chinese 葉 *yep (> Mandarin yè), Manchu abdaha "leaf" ~ afaha "sheet (of paper)" (-ha is originally a collective suffix), Mongolian navch "leaf," Turkish yaprak "leaf" (-(g)ak is originally a suffix), etc. Of course, the English word "leaf" is also vaguely similar.

Words like kap are common in many languages of Asia for "skin, rind, peel, bark, leather": Ainu kap, Japanese kawa (< *kapa), etc. Korean kkeopjil means specifically "bark, rind" rather than "skin," which is rather sal (< sɔlh) or salkkach.

Korean mureup "knee" is unanalyzable within Korean, but note Korean mud- "to cover" and deop- "to cover."

Korean gan "liver" is a loanword from Chinese 肝 gān "liver."

A connection between Burushaski cel and Korean 물 mul "water" (< Middle Korean 믈 meul) seems implausible.

The Korean word for "fire" is 불 bul < Middle Korean 블 beul.

Korean san "mountain" is a loanword from Chinese 山 shān "mountain." A native Korean word for "mountain" is 메 me in Modern Korean (< Middle Korean 묗 moyh), but this is rarely used.

Modern Korean 이름 ireum "name" descends from Middle Korean 일훔 ilhum. A connection with Burushaski ik is not impossible if Burushaski ik descends from an earlier *ilk ~ *irk.

Looking through the Burushaski data in that database, I see a few items that may plausibly be compared with Turkic, Korean, or Tungusic (e.g. kuroN "bone" vs. Manchu giraŋi "bone," nuNus ~ dumus "knee" vs. Turkish diz "knee," riN ~ ren "hand" vs. Korean son "hand," dan "stone" vs. Turkish taş "stone" vs. Korean dol (< dolh) "stone," gan "path" vs. Turkish yol "path, road, way" vs. Korean gil (< gilh) "path, road, way"). However, there are very few obvious similarities.

Ebizur said...

As for the World Language Tree ver. 3, it has Ainu in a group with several Native American languages (most closely with Chimariko of California and Kutenai of the British Columbia/Montana/Idaho border region, followed by Northern Iroquoian i.e. Mohawk, etc.).

"Altaic" Tungusic and Mongolic are grouped together, but related to Andamanese rather than to Turkic.

As you have mentioned, it has Korean grouped with Bunak (a non-Austronesian language of central Timor), and these two then with Burushaski. The entire [Burushaski + (Bunak + Korean)] set is then grouped with a bunch of Native American languages, [Athabaskan + (Aleut + Arawan)].

Among the Northeast Asian languages, Japonic appears in the most basal position in the tree, grouped first with Rikbaktsa (a language spoken by a nation dwelling in part of the Amazon rainforest in Mato Grosso, Brazil) and then with a bunch of Adamawa (specifically Mbum–Day) languages of Africa (Tupuri, Mundang, Mbum, etc.). The [Mbum + (Rikbaktsa + Japonic)] cluster is then connected with a cluster that contains several South American isolates (Aikana and Kwaza of Rondonia, Brazil plus Cofan of the Ecuador-Colombia border region), Kwalean languages (Kwale + Humene) of the "Bird's Tail" of New Guinea, and Kadu languages of the Nuba Mountains of Sudan.

I suspect that the empirical grounding of all these groupings is at least as weak as that of (Burushaski + Korean).

FrankN said...

Hi, Ebizur,

first thanks or your extensive and informative comment. I had been wondering whether there is a point in commenting on an already older post - and obviously there is!

As to Burushaki: Your comments seem to suggest that, if there is a possibility to link it to other languages, one should rather look to the (North-)East than to the West, as has been the main focus in the past. That message alone is already an interesting and worthwhile output
Of course, there are not many obvious similarities to be expected, otherwise linguists would long ago have been able to assign Burushaki to a larger language family. We are dealing with a liguistic, apparently also geographic outlier here. Having said that - I think, the possibility of an ancient Altaic linkage (could some of the Korean loans from Chinese that you mentionned traced back into Altaic as well?) deserves a closer look, whatever will be the final result. Otherwise, note that Jaeger's study has thrown out Burushaki as a "rogue taxa", which usually means some kind of 'creole' structure that sends conflicting signals when the language in question is placed in the phylogenetic network.

Btw, one of the possible parallels you haven't yet commented on is Burushaki "ju[Nus]" - Korean "hy~o" (tongue).

"A native Korean word for "mountain" is 메 me in Modern Korean (< Middle Korean 묗 moyh), but this is rarely used." That's interesting! Speculatively, on could assume some kind of ancient relation to Georgian "mta", which obviously isn't that remote from the English (Romance) "mountain".

FrankN said...

"As for the World Language Tree ver. 3,..":
I think, some bacground information is in order here. As you may have noticed when looking at the ASP database, the approach has some history, details of which you can find here:

Originally, the research was centered in the Leipzig Max Planck Institute for Evolutionary Anthropology - these are the guys who a.o. deciphered the Neandertal genome. Recently, there has been a reorganisation, and reshuffling of funding. Essentially, Leipzig will concentrate on palaeogenetics with German federal funding, while the linguistics were transferred to the University of Tübingen under EU grant funding, for which a new project, EVOLAEMP, was created in 2013.

EVOLAEMP intitially concentrated on improving their methodology, including algorithms for automatic cognate recognition. Jaeger's study is the first thematic output, but surely not the last I assume we will get Africa, the Americas and Melanesia/ Oceania next, before they start looking in more detail into some of the potential Macrofamilies, plus trans-oceanic connections.
EVOLAEMP is linked to other EU-funded projects such as LANGELIN (Language and Gene Lineages, University of York), and QuantHistLing (Quantitative Historical Linguistics, Marburg University, focus on native South American Languages), plus a project on "The Linguistic Past of Mesoamerica and the Andes", which, according to my impression, mainly serves for upholding some of the infrastructure at the Leipzig Max Planck Institute for Evolutionary Anthropology, including the ASJP database. This means, we can expect quite a number of long-range linguistic studies over the years to come.

FrankN said...

"I suspect that the empirical grounding of all these groupings is at least as weak as that of (Burushaski + Korean)." Yes and no. Technically, it is much weaker, as Jaeger's team appears to have substantially improved the approach to automatic cognate recognition. The ASJP team quickly recognized that their approach produced a couple of "wild" linguistic links, which is probably why they refrained from further updates of the "World Language Tree" and replaced it by "Continental Trees" (such as now presented by Jaeger) instead.

OTOH, when it comes to transoceanic linkages:
(1) Skoglund/ Reich picked up several signals in their recent study on the settlement of the Americas. The "Melanesian Signal", mirrored by the "Onge Signal" picked up by Raghavan e.a. (2015) has been widely discussed. Additionally, buried deep in their Supplementary Materials (Tab. S5.1), Skoglund/ Reich also report a Tshwa (KhoiSan) signal with Aymara and Pima, a Kinh (Vietnamese) signal with Bolivians, an Ukranian signal with Kaqchikel, and an Egyptian signal with Quechua.

(2) Recent research has demonstrated an African origin of American bottleguards (from 10,000 BP). The purported interpretation of the bottleguard having reached the Americas from Kenia by transoceanic drift, in five haplotypes separated by up to 80.000 years of differentiation within Africa, can hardly convince. Human transfer, i.e. a transatlantic crossing from Southern Africa, would be in line with the genetic signals picked up by Skoglund/ Reich 2015.

(3) Evidence points to dogs having entered America via the Atlantic, but also intensive trans-pacific genetic exchange over the last 3-4,000 years: (a) An early domesticated American dog (8,500 BP) contains mtDNA closely related to European wolves:
(b)"Traditional" Amerindian dogs, i.e. Chihuahua, Xoloitzcuintl and the Peruvian Hairless Dog contain exclusively European Dog yDNA (Hg1).
(c) No Siberian dog mtDNA has been found in non-Arctic American dogs The same applies to mtDNA A29, prevailing in East Asia and Oceania. OTOH, mtDNA A161, now only present with the Korean Jindo Dog, was recovered from 8 dog burials in Mexico, Bolivia and Peru (6th-12th ct. AD). mtDNA A185, only ocurring in the Chihuahua, and evidenced from a 1.300 years old Mexican dog burial, forms the missing genetic link to two rare hgs, A64 and A65. A64 has been ocassionally found in Japanese and Korean dogs, A65 characterizes a.o. the Chinese Shih Tzu. The Shih Tzu, as well as several other Tibetan and Chinese varieties, contains North American wolf yDNA (Hg 6, see link above).
(d) The Xoloitzcuintli, the Peruvian Hairless Dog and the Chinese Crested Dog and the African Hairless Dog share a rare gene mutation believed to have arisen some 4.000 years ago in Mexico.

(4) Parallels between Jomon ceramics and the first American ceramic culture in Southern Equador have long been an issue; there are no American traditions on which the emergence of ceramics could build.

There is corresponding presence of yDNA C-M217 (C3*), otherwise frequent with Koryaks and Ainu, among Ecquadorian Kichwa and Waorani, with an MRCA of close to 6,000 years.

- to be ctd -

FrankN said...

- Contd from prev. comment -

(6) Coconut DNA proves contact between the Southern Philippines and the American Pacific coast some 2,300 years ago.

(7) The banana, first domesticated in Melanesia, reached Africa quite early. For East Africa (AAA) linguistics support an arrival prior to the Bantu expansion. A second, independent transfer brought AAB varieties to West Africa, most likely from the Philippines or Sulawesi directly by boat to the Gulf of Guinea.
The presence of banana in West Africa is archeologically secured from Cameron at ca. 500 BC. Bench speculates on the transfer of a complete "Tropical Neolithic Package", comprising also Yam and Taro, from South East Asia to West Africa, as the major enabling factor for the Bantu expansion into the rainforest zone (Congo Basin) that commenced about 3,000 years ago.
Whether the banana (Musa spp.) already during pre-Columbian times travelled further across the Atlantic isn't completely settled yet, either. There are early Spanish reports of plants "that the Egyptians call Musa" being cultivated in Mexico. How Spanish could borrow "platanos" (engl. "plantain") from a Carib language when the plant wasn't cultivated there prior to Columbus arrival is anyway mysterious.

(8) Finally, Mota aDNA revealed additional "neolithic" genetic input into East African populations (and beyond?) some 4,000 years ago. This input can be identified as yDNA J1(xJ1e) and T, which in turn points to a source close to Kurdistan/ Mesopotamia. Since Arabia and Sudan are dominated by J1e, a maritime incursion from the Persian Gulf along the Arabian peninsula looks likely. Time wise, this seems to correspond with the introduction of sorghum and the domesticated donkey, both most likely originating in/near Ethiopia, into South Asia, while rice cultivation in West Africa started not too long afterwards.

For the above reasons, trans-continental and trans-oceanic linguistic connections could be much less "wild" than they appear to be at first sight. often we are dealing with time horizons of below 6,000 years that should still be traceable, though of course not be anymore "obvious". However, the languages in question may have been partly replaced and only conserved as substrate, as e..g in the case of Jomon-Ecuador connections. Thus, automated algorithms may get erratic and, depending on the specific sound congruence pattern that have been established statistically, may pair some arbitrarily seeming languages.
Take the Japonic-Mbum cluster as an illustration: Mbum languages, spoken in geographic proximity of the a/m archeological evidence for West African banana cultivation, are quite a good candidate for eventual linguistic traces of contact with South East Asia. Clearly, Japan isn't South East Asia, but Japonic may have preserved some Jomon linguistic substrate, especially of Southern Jomon from Okinawa etc., that partly corresponds to whatever language once was spoken on the Northern Phillipines, from where the Banana connection most likely originated.
Also, if there has been Jomon contact with Southern Ecuador, one would expect to find traces of it somewhere preserved. Cofan doesn't look too bad geographically. The 2008 Version of the World Language Tree (with a different mechanism for detecting sound congruence) linked Japonic to Jivaroan languages instead.

To sum it up: The earlier World language Trees seem to have been able to pick up some signals of intercontinental language contact, without, however, precisely aligning the languages in question. They are clearly not the last word on the matter, but may nevertheless hold some hints on long-range connections that may deserve further linguistic, archeological and genetic research.

Ebizur said...


Thank you for your reply.

"As to Burushaki: Your comments seem to suggest that, if there is a possibility to link it to other languages, one should rather look to the (North-)East than to the West, as has been the main focus in the past."

I think that there exists a likelihood of at least some loanwords from an East Asian language or languages being present in Burushaski because the modern Burushos have been demonstrated to contain ancestry related to East Asian populations, including Y-DNA belonging to haplogroups C2-M217 and O3a2c1-M134. Overall, however, their genetic affinities link them mostly with Western Eurasians, so it is sensible to predict that much of their linguistic inheritance is
connected with languages of other primarily Western Eurasian populations.

"Having said that - I think, the possibility of an ancient Altaic linkage (could some of the Korean loans from Chinese that you mentionned traced back into Altaic as well?) deserves a closer look, whatever will be the final result."

I do not accept the Altaic hypothesis. My working hypothesis is that Mongolic and Tungusic are fundamentally ("genetically") related within a potentially recoverable time frame, and that some branch(es) of Turkic has interacted closely with pre-proto-Mongolic during roughly the first millennium CE. Besides its interactions with Mongolic, Turkic also has clear links with Uralic and Indo-Iranic languages, and, after all these linkages have been accounted for, I am not sure how much indigenous residue would be left, nor where it would point in regard to the ultimate geographical origin of the pre-proto-Turkic-speaking ethnos or their biological affinities.

Regardless of any potential deep shared roots (which might be shared with e.g. Indo-European languages as well), it is clear to me that the majority of similarities among the so-called Altaic languages (Turkic, Mongolic, and Tungusic) are the result of a "Sprachbund,"
and this Sprachbund has involved at least Indo-Iranic and Uralic (and probably also Korean) languages in addition to the so-called Altaic languages. Even the connection between the Mongolic and Tungusic languages that I think may be "genetic" is certainly very ancient,
on the same level as the relationship between e.g. Spanish and Russian.

I do not see any reason to suspect that Chinese 肝 "liver" or 山 "mountain" has been borrowed from any so-called Altaic language. Turkic has *daɣ > daa, too, taw, tay, etc., Mongolic has *aɣula > *aul > uul, Manchu has alin, Nanai has xurä(än) ~ furän, Evenki has urə for "mountain."

"Otherwise, note that Jaeger's study has thrown out Burushaki as a "rogue taxa", which usually means some kind of 'creole' structure that sends conflicting signals when the language in question is placed in the phylogenetic network."

I think that Burushaski is probably a mostly isolate language with some influence from various Indo-Iranic, Turkic, Bodic (Tibetan), etc. languages. The foreign loanwords might have caused it to be labeled as a "rogue taxon."

Ebizur said...

"Btw, one of the possible parallels you haven't yet commented on is Burushaki "ju[Nus]" - Korean "hy~o" (tongue)."

Korean 혀 hyeo is not particularly similar to any word for "tongue" in any other language with which I am familiar. Turkic has dil ~ til, Mongolic has kele > xel ~ xeli, Manchu has /ileŋu/ > /ileŋə/, Japanese has shita or (vulgar) bero, Ryukyuan has siba ~ suba (polite; used to refer to the tongue of a human being) or shicha ~ hicha, Chinese has 舌 (Mandarin shé, Sino-Korean seol, Sino-Japanese zetsu < Middle Chinese *(d)ʑiæt ~ *(d)ʑiɛt). The Chinese word and the standard Japanese word are quite similar to each other. Ryukyuan has a cognate (shicha ~ hicha) that as an independent noun is normally used to refer to the tongues of non-human animals (though it may also refer to the tongue of a human or to the faculty of speech in general in compounds).

Korean generally has lost *r and *n before *i or *y and has undergone syncope of vowels in some contexts (unstressed?) to produce consonant clusters, most of which subsequently have been simplified. Therefore, Modern Korean hyeo theoretically could descend from an Early Middle Korean *hVrye ~ *hVlye ~ *hVnye.

As for Yasin -yúŋus ~ Hunza -úmus (an inalienable noun in both dialects of Burushaski), I see no reason to parse it into two morphemes, let alone compare it with Korean hyeo.

Kristiina said...

This post may remain unnoticed, but however I add a few possible cognate words to the Korean "hyeo". In Nuosu (Yi) "tongue" is ha³³nɛ³³. In Yukaghir "tongue" is onor/vanar. Ket word for "tongue" is ēj, eˑy, ɛːyǝ. In Cantonese yúhyìhn seems to mean "language", but I am not able to analyze this compound.

In Hmong "to lick" is yai13.

If Korean word is connected with the above words, Nuosu and Yukaghir and Cantonese (?) point to the presence of 'N' sound and to the construction *hVnye.

Ebizur said...

Thank you for your input, Kristiina.

Cantonese 語言 (yu5 yin4) does indeed mean "language," but this is a transparent bimorphemic compound derived from 語 and 言, both of which have similar semantic content (either as a noun meaning "language, words, speech, saying, expression" or as a verb meaning "to speak, to say"). In Standard Mandarin Chinese as used in the PRC at present, its cognate is written 语言 and read as yǔyán. The same two morphemes are used with their order reversed in Japanese 言語 gengo "language" and Korean 언어 eoneo "language." Most such (originally pedantic) neologisms created from Chinese root morphemes have been coined in Japan for the express purpose of translating Western concepts and then "calqued" or copied into Korean according to the standard Korean readings of the Chinese characters. An older way of referring to "language" in Japanese would be 言葉 kotoba; in Korean, 말 mal refers to "speech, words, language" in general, whereas 사투리 saturi refers specifically to a local dialect.

(Long essay about Cantonese and Tungusic words for "lick" and "tongue" that followed here has been devoured by the Great Black Hole of Cyberspace.)

Ebizur said...

Omitting all the detailed explanation, here are words for "tongue" and "to lick" in various languages of East Asia:

舌 sit6 (or sit3) "tongue; tongue-shaped object, such as the clapper of a bell; words, speech" (cognate with Mandarin 舌 shé)
嗒 daap1 (or occasionally dep1) "to try and assess a taste, to lick" (cf. 啖 daam6 / Mandarin dàn "to eat, to feed; to chew, to bite; to entice")
�� laai2 "to lick/lap with tongue"
舐 lem2 (colloquial) / saai2 (literary) "to lick with the tongue, to lap"
脷 lei6 "tongue" (e.g. 舐舐脷 lem2 lem2 lei6 "to lick; to lick one's lips in anticipation." An ad hoc explanation of the difference between the colloquial Cantonese and the Mandarin/literary Chinese words for "tongue" relates this word to 利 lei6 "profit, gain, advantage, benefit, merit; sharp; to benefit, to serve" as a euphemism for 舌 sit6 "tongue" because of the latter word's phonemic identity with 蝕 sit6 "to suffer a loss, to wear out.")

舌 shé "tongue" (normally used as 舌头 shétou "tongue" < 舌 shé "tongue" + 头 tóu "head" in colloquial conversation)
舔 tiǎn "to lick with tongue, to lap, to taste" (colloquial)
舐 shì "to lick with the tongue, to lap" (literary)

There seems to be one group of Chinese words (舐 literary Mandarin shì/literary Cantonese saai2 "lick," 舌 Mandarin shé/literary Cantonese sit6 or sit3 "tongue") that are essentially Northern and begin with /ś/ (retroflex s, transcribed as "sh" in Mandarin pinyin) or /s/, and another group of colloquial Cantonese words (�� laai2 "lick," 舐 lem2 "lick," 脷 lei6 "tongue") that are essentially Southern (or specifically Cantonese) and begin with /l/.

ile- "to lick"
ilenggu "tongue" (Pronounced as [ileŋə] by at least some of the few remaining speakers of Manchu. Probably pronounced *[ileŋu] or *[ileŋɣu] in earlier times. cf. Evenki inŋi "tongue.")

I guess that the Tungusic words might descend from an earlier */hilə/, related to a common Mongolic root for "tongue; language; to say." The -nggu suffix (or -ŋi in Evenki) is one of several suffixes of unclear semantic content (perhaps originally markers of number, gender/noun class, deverbal noun derivation, etc.) that occur frequently in Tungusic, some of which have parallels in Mongolic.

kele(n) "tongue; language" (also xeli < *kele-yi ~ *kele-gi in Daur; cf. the aforementioned issue of suffixation in Tungusic)
kele- "to say"
doluɣa- ~ doliya- (> Khalkh doloo-) "to lick"

yala- "to lick"
dil "tongue" (Apparently, the cognate of this word in Chuvash is phonetically quite divergent; I have seen it transcribed as čəlxe, čǝlɣe, or čǝlǝx.)

Ebizur said...

yelel- ~ helel- "to lick"
hilx "tongue"

Chukotkan (Chukchi-Koryak)
yilə ~ yil ~ yilyil "tongue"

핥- halt- "to lick"
hyeo "tongue"

舐め- name- "to lick"
shita "tongue"
べろ bero "tongue" (vulgar)

Kunigami Ryukyuan
namirun ~ nambin "to lick"
subaa ~ sibaa "tongue"
sichaa ~ hichaa "tongue" (vulgar)

kem "to lick"
parunpe "tongue" (< Ainu par ~ char "mouth" + Ainu un "to be in" + Ainu pe "thing, one who ~s")

There seems to be some vague resemblance among the words for "tongue" and "to lick" in many Siberian, Mongolic, and Tungusic languages. The Turkic word for "to lick" is also similar, but the Turkic word for "tongue" stands out for having a dental obstruent (/d/ ~ /t/) in initial position in most Turkic languages (though this appears to have become an affricate in e.g. Chuvash). The Korean words seem like they might (or might not) ultimately be related to this former group, but they are very divergent in any case.

Kristiina said...

Apart from Cantonese “sit6” and Japanese “shita”, a similar root without ‘t’, as in Chinese, exists in other Sino-Tibetan languages, e.g. in Burmese “ʃà” and Bai “ce42”, tongue.

Erzya word for tongue is “kel” which is close to the proto-Uralic construction *kele. The other proto-Uralic construction is *ńälmä. Ugric roots belong to this root: mansi ɲiʎǝ̆m, khanty ɲäːɬǝ̆m, and Mari word is “jəlme”. This root bears a vague resemblance to Japanese and Ryukyuan forms but it may be a pure coincidence. Instead, Mongolic forms “kele(n)” tongue; language (also xeli < *kele-yi ~ *kele-gi in Daur), kele- "to say" are so close to the Uralic construction that I would say that they are related and even more now that we know that there is N1c in some Mongolic and Tungusic tribes.

Koryak and Chukchi words for tongue are quite close to Turkish “yala-“, to lick, as well as Nivkh roots “yelel-“ ~ “helel-“ to lick and hilx "tongue", and they could ultimately be related to Uralic root *kele and possibly also to Indo-European root *ghel-, to call, cry. If the Korean word goes back to form *hVlye, it could be part of this development.

However, if the Korean word goes back to form ‘*hVnye’, it is not impossible that Nuosu (Yi) word “ha³³nɛ³³” and Yukaghir word onor/vanar and Ket word ēj, eˑy, ɛːyǝ and Chinese “yǔ” and ”yán” are related. In any case, I checked that Na-Dené languages did not have a similar root as the Ket root, so the Ket root looks like having a (more recent) Siberian origin. On the basis of the recent paper “Ancient DNA evidence reveals that the Y chromosome haplogroup Q1a1 admixed into the Han Chinese 3,000 years ago”, it is not at all impossible that at least Ket root and Chinese roots are related.

I would guess that “sibilant+vowel root belongs to Sino-Tibetan groups and ‘l’ +vowel belongs to Southern groups while Ainu have their own completely different roots.

Ebizur said...


The loss of */-t/ in Mandarin 舌 shé "tongue" is an entirely regular part of the diachronic change from Late Middle Chinese to Mandarin. The final /-t/ is retained in many modern Chinese dialects/languages (e.g. the previously mentioned Cantonese literary reading of 舌 sit6 ~ sit3) as well as in loanwords to non-Chinese languages (e.g. Sino-Japanese 舌 zetsu (~ zechi), Sino-Korean 설 seol -- Japanese final /-tsu/ ~ /-chi/ and Korean final /-l/ are regular reflexes of Middle Chinese final /-t/). This sound change has occurred within the last millennium, so for the Burmese or Bai word to be cognate, it must have undergone a similar loss of the final consonant, or else the final */-t/ of Middle Chinese must have developed from what was previously a suffix. (Frankly, I do not know precisely how the “ce42” notation for the Bai word is supposed to be read; I presume at least part of the "-42" is intended to transcribe tone.)

Also, I should have mentioned that the Korean word for "to lick" has appeared as 핧 halh- in some texts from the fifteenth or sixteenth century (e.g. Wolinseokbo (1459) 혀로太子ㅅ두누늘할하 hyeoro taejas du nuneul halha "licking the two eyes of the crown prince with (one's) tongue") instead of 핥 halt- as it appears in standard Modern Korean. Some modern dialects also have variant forms, such as that of Jeju Island, 할르 halleu- ~ 할라 halla-. It is hard to determine the source of the /t/ that appears at the end of the root in Modern Korean. Since Korean /t/ usually seems to descend from an earlier */dVh/ or */hVd/, and /l/ (realized as [l] ~ [ɾ]) is phonotactically close to /d/, one might suppose that the Korean verb for "to lick" descends from a reduplicated form of a more primal root */hal/ (i.e. */hal-hal/ > */halhl-/ > /halh-/ ~ /halt-/ ~ /halleu-/).

As for Ainu, the only word attested for "tongue" is unfortunately a transparent compound. However, words for "mouth," "tongue," and "lip" seem to be often confounded in the Japanese Archipelago (e.g. Kyushu suba "lip" vs. Okinawa subaa ~ sibaa "tongue"), so it is not certain that Ainu par ~ char "mouth" did not previously denote "lip" or "tongue" even without the -un-pe suffixation. (The modern Standard Japanese word for "lip," kuchibiru, consists of kuchi "mouth" compounded with some element of rather obscure origin.) Also, Ainu does not phonemically distinguish voiced and voiceless oral obstruents, so parunpe "tongue" might as well be transcribed as barunbe, and kem "to lick" might as well be transcribed as gem.

Ebizur said...

Speaking of Uralic, the distribution of the two different roots for "tongue" is difficult to interpret.

Nenets has нямю nyamyu "tongue," which seems to belong to the same cognate set as Hungarian nyelv /ˈɲɛlv/, Northern Sami nyalbmi, etc. However, Hungarian also has nyal /ˈɲɒl/ "to lick" and nyál /ˈɲaːl/ "saliva." This group of words for "tongue" vaguely resembles Japanese name- "to lick," but the */-mä/ element must be a suffix in order for this group of Uralic words for "tongue" to be related to e.g. the Hungarian word for "lick" or "saliva." This would basically require Japonic to be a (para-)Uralic language, which I do not think is tenable, so the resemblance is probably illusory. Perhaps a more fruitful line to pursue might be a connection between the Uralic root for "lick" and Turkish yala- "lick" (or even Mongolian (Khalkh) doloo- "lick"; fortition of glides to plosives is a characteristic of the easternmost Uralic languages, i.e. Samoyedic languages, and I suppose a similar relationship might hold between Turkish yala- and Mongolian doloo-); Turkish dil "tongue" and Chuvash čəlxe ~ čǝlɣe ~ čǝlǝx might also be considered in this connection. Note that the comparison Hungarian nyal /ˈɲɒl/ "lick" vs. Turkish yala- "lick" seems nearly analogous to Hungarian nyár /ˈɲaːr/ "summer" vs. Turkish yaz (in some Turkic languages, there is also a variant yay, and the meaning is sometimes broad, meaning "spring or summer") though the vowels in the Hungarian comparands do differ from each other.

On the other hand, Nganasan has сиәде siədye "tongue." Samoyedic languages have undergone some interesting palatalization and fortition processes, so it would not surprise me if the aforementioned Nganasan word were cognate with Finnish kieli "tongue."

Kristiina said...

Ebizur, I fully agree with you. The above-mentioned Ugric-Nenets form may have a different West Siberian origin compared to *kele, but it is also possible that it is derived from *kele because I agree that some Siberian sound shifts are very particular like the development from *kele to siədye. I have noticed that sound shifts in Ugric languages Khanty and Mansi are also particular. The variation between 'd' and 'l' is also in Ugric languages.

As for Korean language, are there many words that in your opinion have a Turkic origin? Of course, I know that they exist, but Some Altaic constructions in the Tower of Babel between Turkic/Mongolic and Korean are a bit weird. Have you noticed correspondences between Korean and Yeniseian languages, or is this root unique?

I agree that it is not clear if Chinese form and Tibeto-Burman forms can be related because of that 't' sound. To my understanding, Tibeto-Burman languages are not so well studied.

Kristiina said...

In Nganasan, Nenets and Enets, PS vowel-initial words gain an initial /ŋ/ (which may be subsequently palatalized to /nʲ/). This is seen in the word 'to live'
Nenets jileś, Khanty jel-, to live
Nganasan ńile-, to live
Uralic *elä, to live

The situation with the word 'tongue' is not the same:
Mari “jəlme” tongue, Chukotkan (Chukchi-Koryak) yilə ~ yil ~ yilyil "tongue", Turkic *dɨl, tongue
Mansi ɲiʎǝ̆m, Khanty ɲäːɬǝ̆m, Nenets nyamyu
Nganasan siədye
Uralic *kele

However, we know that there is a verb 'swallow' which is *ńele, in all Uralic languages and it could very easily be related to the word 'tongue', *ńälmä ('mä' could be a suffix for making a noun). Because of this, I would say that the word *ńälmä is probably of West Siberian origin and unrelated to *kele. Maybe the Turkic form *dɨl could be derived from the West Siberian root which also gave rise to the form *ńel.

By comparison, a change from dj to nj is attested in Mator and Kamassian, as in these languages /b/, /dʲ/ are furthermore nasalized to /m/, /nʲ/ preceding a word-internal nasal. This has been an areal change, shared also with Siberian Turkic languages such as Khakas.

FrankN said...

*tV-/dV- for the semantic cluster "tooth/ tongue/ taste/ eat/ meat" (swallow, lick) is very frequent among world languages, to the extent it may actually constitute paleolithic substrate. Aside from your examples (assuming a possible t>s(h) sound change in Chinese/Japanese languages), and IE, it is e.g. found in

- Proto-AfrAs *ṭaʕam "taste, eat",
- Niger-Congo: Ciluba (DR Kongo) -dya "manger", Tswana diyo "food", Tsonga dya "eat", Kirundi in-dya "food", Zulu -dla "eat, feed on, drink, bite", Douala da "manger", Ewe ɖu "eat", Proto-Mande *da "mouth", etc.
- Nilo-Sahar: Maa a-daá "to eat, feed", ɛn-dáà "food", Kanuri dâ "meat, flesh"
[Note here also *di/*tu in the sense of "tongue~language,speech" found in various ethnonyms such as Bantu (pl. Muntu), Ma'adi, Mandi, Dinka, Dima, Tamil(?); c.f. Lendu ma "I/we", ke "person"]

- Jakalteko (Mayan) ti "Mouth",
- Highland-Tequistlatek de "eat", du "fish"
- Muinane (Columbia) du "eat",
- Tacana dia "eat"
- Ojibwe doon "mouth", denanw "tongue", dakwam "bite"
- Comanche (Uto-Atztecan) tupe "mouth", tuku "meat", tuka "eat"
- Zuni ito "eat", awati "mouth".
- Lakota tȟaló "meat", yútA, wótA, (wa)tȟébyA "essen", (wa)yaȟtákA "beissen" ["wa-" = 1. Person Sing.]
- Navajo azéé´"mouth", atsiiʼ "meat"

In relation to Hungarian nyal /ˈɲɒl/ "lick" vs. Turkish yala- "lick", Zulu nyala "lick" is interesting. The chain is possibly via Malay menjilat "lick", though the further trail through (or around?) East Asia is a bit obscure.

On "lick" itself I a.o. noted Arab. لعق laeaq, Georg. ლოკვა lok'va, Yoruba la, Thai เลีย Leīy, Burmese lha kyaya - also quite a number of parallels..

The d>l lambdacism is quite common. Pokorny reports Old Latin dingua shifting into Classic Latin lingua, which allows him to unite the Germano-Celtic forms on *t- with other IE forms (Balt., Arm., Class Lat.) on *l- into the common root *dn̥ĝhu̯ā "tongue". Dwyer (1989) identified this lambdacism in his reconstruction attempt of proto-Mande. As such, some of the "lick" terms above may also reflect the common *tV-/dV- root for tongue/ taste etc.

Returning to the starting point, Ebizur: In the SuppMat to the recent CHG study, Tab.9, the best 2-way admix (lowest f3 stat.) for Burusho is given as Satsurbia-Korean. Seems Jaeger hasn't been that wrong here. His study qualifies Burusho as a "rogue taxa", meaning it includes two statistically distinguishable phonetic subclusters - typically indicating some kind of mixed language. Whether one of them was really Korean-like remains to be seen. Alternatively, I could imagine a kind of "stranded Mongols" scenario, similarly to what we find genetically (though not linguistically) with the Hazara (they, btw., are shown as Korean-Spain_EN mix in that table).

@Kristina: I had asked you a few questions in the "Mixed marriages" post, a.o. related to Uralic *elä, which you may have overlooked (you need to scroll up some 15 posts from the end to find it). If you find the time to answer them, just leave a short note here (I get a message on comments here, but not in that other thread).

Kristiina said...

Frank, I have now answered to your questions!

Ebizur said...


"However, we know that there is a verb 'swallow' which is *ńele, in all Uralic languages and it could very easily be related to the word 'tongue', *ńälmä ('mä' could be a suffix for making a noun). Because of this, I would say that the word *ńälmä is probably of West Siberian origin and unrelated to *kele. Maybe the Turkic form *dɨl could be derived from the West Siberian root which also gave rise to the form *ńel."

I have considered the Hungarian verb for "to lick" or noun "saliva" to be more likely relatives of the Hungarian word for "tongue," but I suppose a relation to the verb for "to swallow" might also be plausible. In that connection, cf. the document at, in which it is stated that the previously mentioned Northern Sami cognate, njalbmi (i.e. nyalbmi), means "mouth" rather than "tongue." "Mouth" or "throat" seem like body parts that should be more likely to be related to a verb for "to swallow" than "tongue" (this latter body part, in my opinion, is more likely to be related to verbs for "to lick," "to taste," or "to say/speak/talk").

As for Turkish dil "tongue," it may be pertinent that proto-Turkic appears to have not distinguished nasal and oral resonance in word-initial position (except in the case of the interrogative ne "what") nor even voicing (voicing is distinguished in word-initial position in modern Turkic languages, but this seems to have developed separately in each branch of Turkic). In other words, there is no way to know whether the proto-Turkic word for "tongue" was *dVl, *tVl, or *nVl. However, the glide *y was distinguished from the dental stop in proto-Turkic even in word-initial position, so dil "tongue" and yala- "to lick" must descend from etyma that had differentiated before the proto-Turkic stage.

Ebizur said...

"As for Korean language, are there many words that in your opinion have a Turkic origin? Of course, I know that they exist, but Some Altaic constructions in the Tower of Babel between Turkic/Mongolic and Korean are a bit weird. Have you noticed correspondences between Korean and Yeniseian languages, or is this root unique?"

The Korean and Turkish languages do have some words that happen to resemble each other in form and meaning (e.g. Korean 모든 modeun vs. Turkish bütün "all," Korean 껍질 kkeopjil /k͈ʌ̹p̚t͡ɕ͈iɭ/ vs. Turkish kabuk "bark (of a tree)," Korean 꼬리 kkori vs. Turkish kuyruk "tail," Korean 목 mog "neck, throat" vs. Turkish boğaz "throat"), but they are insignificant. One may also find as many false friends by comparing Korean with any other language, such as English: Korean 우리 uri "we / us / our" vs. English our, Korean 뎌 dyeo > 저 jeo "that" (distal deictic pronoun) vs. English the ~ that, Korean 안 an "not" vs. English un- "negative prefix," Korean 많이 manhi "many, much, a lot" vs. English many, Korean 두 du "two" vs. English two, etc. What makes a comparison of two languages significant or meaningful is systematic, regular correspondences, and such a system of correspondences may be expounded neither for a relationship between Korean and Turkish nor for a relationship between Korean and English.

The Yeniseian languages are nearly extinct (with only Ket surviving, and even that just barely) and poorly attested (which should be at least partly attributable to the early date of extinction of many of these languages -- Pumpokol seems to have gone extinct more than 250 years ago, i.e. before the United States of America gained their independence from the Kingdom of Great Britain!). The Northern Yeniseian (Ket, Yugh) and Southern Yeniseian (Kott, Assan, Arin, Pumpokol) groups are well distinguished, and there are some notable differences even within each of those groups. The poor attestation of Southern Yeniseian hampers any attempt to reconstruct a Yeniseian proto-language.

In any case, I do not notice any remarkable similarities between attested Yeniseian words and Korean words. Yeniseian words for "sleigh, sled" vaguely resemble the first syllable of the Korean word for the same (썰매 sseolmae), but that word is generally considered to be a corruption of a Chinese-based Korean word 雪馬 seolma "snow-horse."

On the other hand, I do recognize some similarity between certain Yeniseian and Turkic etyma (e.g. "stone," "winter," "person"). Furthermore, there are some more recent (and, thus, obvious) Turkic loanwords in some descendent branches of Yeniseian. However, the Yeniseian languages are overall very peculiar, and must have developed in (relative) isolation for a long time.

I also do not think Yeniseian words for "tongue" (Ket ēy, Yugh ey, Pumpokol ay) are particularly similar to Korean 혀 hyeo. There is no trace of any velar or guttural sound in the Yeniseian words.

FrankN said...

J. Klapproth's "Asia Polyglotta" (1823, freely available as Google Book) notes two possible Korean-Yeniseian isoglosses (p. 335f):
- Kor. Pai, Ket bhus "belly, gut" [Bauch]
- Kor. Jip, Kott. hobis, slaw. guba "mouth"
I am not fully convinced here.

However, his comparison of Ainu with Yeniseian (and Samoyedic, and Caucasian) seems interesting (I noted a/o Aino/Sakhalin: ai "tongue", p. 302 f).

Klapproth's collection of Yeniseyan words (p. 171ff) seems to be the most recent and comprehensive one as concerns South Yeniseyan (it is still cited in the Tower of Babel DB), so you might eventually want to take a look.

Kristina; I have seen your reply in the other entry, and posted an answer there.

Kristiina said...

Ebizur, I agree with you that in order for the Yeniseian root to be connected with the Korean root, there should be a loss of the guttural sound and if a root without a guttural sound exists in Aino/Sakhalin, the connection is even less probable. However, it would be great if someone could decipher the Xiognu texts as their language could belong to the Yeniseian family. That could give a new swing to many Yeniseian constructions. However, I do not know how many Xiognu texts exist.

Frank, thank you for that link! I have collected the Yeniseian words that I have in my Excel table from the Tower of Babel and from the Yeniseian Swadesh list on Internet and from the book Proto-Yeniseian Reconstructions with Extra-Yeniseian Comparisons of Sergei A. Starostin and Merritt Ruhlen. My German is quite bad but I saved the Google version of that Klapproth's book in my favorites.

As for ‘tongue’ I can’t see anything relevant in Caucasus: Georgian has ‘ena’, Proto-Nakh has *moṭṭ and Proto-Dagestanian points to the reconstruction *mĕlc̣_ĭ and West Caucasian has ‘*bǝźA’.

IMO, we should look at the big picture, i.e. it is not enough to compare two or three languages in a small area. In order to see the deep origin of a root we should compare all languages in a big area, which is of course a massive task.

Ebizur said...

"- Kor. Pai, Ket bhus 'belly, gut' [Bauch]
- Kor. Jip, Kott. hobis, slaw. guba 'mouth'"

I think the cited Slavic word covers the semantic range of "mouth" and "lips."

Korean "Pai" is a reference to Modern Korean 배 bae /pɛ̝/ "belly" < Late Middle Korean ᄇᆡ bŏy */pɔj/ ~ */pʌj/ "belly."

Korean "Jip" is a reference to Modern Korean 입 ib /ip̚/ "mouth" < Late Middle Korean 입 ib */ip̚/ "mouth; entrance, exit, door."

Thank you for reminding me about the Ainu dialectal variant aw "tongue." This etymon is nearly limited to dialects that have been spoken in areas now under Russian control (Maoka, Nairo, Ochiho, Raichishka, Shiraura, and Tarantomari in Sakhalin as well as in sparse records of the Ainu language as formerly spoken in the Kuril Islands); perhaps this might have contributed to its rarely being mentioned in Japanese sources regarding the Ainu language, though it has been recorded in Soya dialect (from the northernmost point of Hokkaido, just across La Pérouse Strait from the southern end of Sakhalin). Everywhere in Hokkaido south of Nayoro, it seems that only the compound parunpe has been recorded.

Please be careful not to confuse Sakhalin/Soya/Kuril Ainu aw "tongue" with common Ainu (including Sakhalin and the Kurils) ay "arrow."

Considering the limited distribution of Ainu aw "tongue," I think it might be prudent to consider a possible link with common Ainu haw "voice; (in some compounds) song; reportative evidentiality marker, marker of hearsay," Nivkh au "voice," or Japanese koe (< こゑ kowe) "voice, vocal sound (of a cat mewing, etc.); tone of voice, manner of speech, pronunciation, accent; sound (of insects chirping, a large bell tolling, the wind causing objects to rub together, etc.); words (of God), that which a deity tells a person; [obsolete] the proper(=pseudo-Chinese) pronunciation of Chinese characters, which is now called on'yomi (< Chinese 音 'sound' + Japanese yomi 'reading; counting')" before considering Yeniseian ēy ~ ey ~ ay "tongue; (in Kott) voice, sound," which is geographically very distant and phonetically not such a good match.

FrankN said...

@Ebizur: As ceramics are assumed to have spread westward from the Jomon culture through Siberia, I wouldn't in principle rule out Jomon terms occuring simultaneously in Ainu and Siberian substrate just because of geographical distance (phonetics is another issue, thohugh!)
In this context, I found Klapproth's connection for fire (a term obviously associated to pottery, p 303) interesting:
Ainu apeh, Malai api, New Guinea (where) eef, Breton afo (plus, unmentionned by Klapproth, of course other IE variants such as oGr. pyr).
We are having the genetic trail of dogs from at least the Near East to Mongolia by around the 8/9th mill BC confirmed by DNA research, with a corresponding trail of dog terms on *c(h)u. Millet cultivation seems to have travelled in the opposite direction from NE China by the same time (and millet terms as well!), why not something more?

Further above, you had been talking about "false friends" between Korean and English. Looking at the lexical closeness between Chukotko-Kamchatkan and IE (and here especially Bryhonic Celtic) that Jaeger has identified, I wonder whether so many "false friends" can't actually also include some true ones.

Related to this is the question how East Asian DNA arrived in NE Europe including Finland. The common explanation is via reindeer herders. However, in that case, their herds should have mixed as well, but they didn't!
"Haplotype sharing is very limited between Russia and Fennoscandia (figure 1a), suggesting separate origins of domestic reindeer in the two regions. This implies limited exchange of animals between the reindeer herding people of Fennoscandia and the indigenous cultures in western Russia."

If East Asians didn't arrive over land, they probably came by boat in periods with warmer climate and a largely ice-free Arctic Ocean. Ayon, Chukotka, RF to Skjanes, Finnmark, NOR is a bit over 4,000 km by sea. Not around the corner, but certainly feasible for skilled coastal fishermen. And the route from Norway to Scotland has never been a major problem for sailors...
This isn't meant to imply that Chukotka-Kamchatkans ever spoke a kind of IE - they probably didn't. But Indoeuropeanisation of Scotland and Ireland may also be a rather recent phenomenon, and shared Arctic substrate is a possibility to be considered.

FrankN said...

I have taken Burush. phu (dial. pu), Ainu ape(h), Korean bul "fire" as occasion to look a bit deeper into the root. Quite surprising results! [Sources: ASJP DB, ²=Tower of Babel DB. Meaning is always "fire", unless indicated otherwise]

1. Circumpacific root *(a)pi:
Aside from Ainu ape(h), we find
- PJap. *pi "sun,day"²
- PLai *api, PKadai, PKamTai *pui, PThai *pai², PKadai *w3n "sun"
- PAN *api, *adaw "sun"
- similar forms in various Austro-Melanesian languages, e.g. Demta (Sentani) pa5, Dumpu (TNG) pe, Warungu/ Wulguri (PNyung) buri, Angaatha (TNG) ip3 "sun", Iwam (Sepik) pay, pi "sun", Kasua (Opawi) opo "sun"

So far, so unsurprising. But we also have, across the Pacific:
- Penuti *pʔi²
- Chimariko apu, PHokan *ipI², PYuman *par "sun"
- Kalapuya pyan7 "sun"
- Wintun *pho "fire", *phuk "ash"
- Tanoan *pha², Kiowa phia, pae "sun"
- Paezan *paa-²
- Tupi-Guarani *apɨ under *pak 'burn' R 102); *pe 'sun, day'²
- Siouan *hpete, *wira "sun"
- Uto-Azt. *tapa/ *tawa "sun"
- Popotekan *wi
- Cariban *weyu "sun"
- Awa Pit pa "sun", Guambanio p3C "sun"
- Abipon pae "sun"
- Quechua *nupai "sun"
- Kurina Araua jiphu
- Kaingang pi
- Puri pote, ope "sun"
- Carapana pea, pero

For convenience sake, though belonging to Cluster 2, I add here
- Pano *bari "sun"
- Kiche porom
- Spokane peC, p'aX "burn", piq "white"

.. tbc

FrankN said...

2. Eurasian root *pVr / *bVX:
Ainu apeh seems to indicate the presence of a glottal, which may either have been added for semantic distinction ("fire" vs. "sun" atl), or was lost somewhere during circumpacific travel. Traces of such a glottal are, aside from the Amerindian examples above, present in Rukay (dial.) apuru, plus a few other AN forms.

Eurasian forms with final glottal include
- PST *bar, *[ph]ǝw "bake, set on fire", also a number of "sun" terms
- Middle Korean pɨ́r², Turk *ört (Chuvash virt) "flame, to burn"², Mong. *(h)örde "to burn"²
- PAuA *bʔuh / *bʔoh "burn, roast"², Korwa ber "sun"
- Chukchee-Kamchatkan: *puje 'нагретый на огне' (to heat, roast, smoke?)²
- Yugh bok, bog, Ket bok, Pump. buC
- Komi Permyak bi, Ural. *porV "to burn"², *päjwä "sun, day"²
- Drav. *pu- "spark"², *por- "to fry, roast"², *pord- "sun"², Tel.nippu
- PKartv. *bir, *pur "to warm"²
- Avar baq, Beshta (Tsez.) boq, Dargwa berX~i, Lak barX~ (all "sun")
- PIE *pue-r/n-² (Hitt., Toch., Arm., Slaw., Balt., Germ., OGrk., non-Lat. Ital.) [also "bake", "burn", "boil", Germ. (ent-)fachen "to light a fire" via other paths?]
- Basque bero "hot"

3. The African trail:
Malagassy afo/afu is expected - its an AuA language after all. More surprising is:
- AA (non-Semitic) *faḥ-², Tamashek efew
- Omot: Mocha abe, Male abi (both "sun")
- Bantu *pia², c.f. Suah. piri "hot chili pepper"
- Other NC: PGbya (NC) *we, *weae "sun"; Burak (Adam.) be "sun", Wapha (Jukon.) pyu, Ahlo (Kwa) ibi, PUkaan *ewiS "sun", PTogo *pila "sun", Wolof safara
- NS (speculative): Gumuz woka, Shabo Cuwa

Another, more speculative link is presented by
- S. Andamese bodo "sun"
- E. Oromo abida, adu "sun"
- Hausa wuta
- NS: Koma wati, Temein podN, Baka fo7du, Bagirmi pwod