Did the Aryans migrate to India?
  • Who were the ‘early Indians’? The evidence is clear, and between the Aryan Invasion and the Out-of-India theories, it overwhelmingly favours the latter.

A new book, by former business journalist Tony Joseph, claims to have full and final answers to the following questions: “Who were the Harappans?” “Did the Aryans migrate to India?” and “When did the caste system begin?”

The book, titled Early Indians: The Story of Our Ancestors and Where They Came From, is clearly yet another attempt to establish the Aryan Invasion Theory (AIT) as rock-solid, despite non-existent archaeological evidence to support the theory, this time using genetic evidence to back it.


European and American Indologists over the last two centuries have remained wedded to the AIT hypothesis, even though a rival Out-of-India Theory (OIT) spread of Indo-European languages and cultures, has emerged as a reasonably coherent challenger.

Genetics, the new super-science, is irrelevant to this argument. The problem with trying to prove the movement of Indo-Aryan language and culture through the mapping of haplogroups and mitochondrial DNA is seriously flawed for the simple reason that the two can move independently of each other.

The cover of Tony Joseph’s <i>Early Indians: The Story of Our Ancestors and Where They Came From.</i> The cover of Tony Joseph’s Early Indians: The Story of Our Ancestors and Where They Came From.

While genetic studies of populations across geographies and time periods can be effective in mapping the different ancestral strands in the DNA of individuals and communities, it cannot show the origin and movements of cultural features like languages, belief-systems, etc.

For example, the English and Spanish spread out from Europe to the Americas: yet, can we conclusively trace this on the basis of the Indo-European haplogroups found in the DNA of exclusively English and Spanish-speaking people of Native American and African origin in the Americas? Buddhism spread from India to the whole of northern, eastern, south-eastern and (in ancient times) even to western Asia.

Can this be traced through the spread of ‘Buddhist’ haplogroups found all over Asia and particularly in the Buddhist countries? Chess (chaturang) spread out from India and became the national game of every Asian country, from Mongolia (shatara) in the north to Vietnam (chhoeu trang) in the east to Arabia (shatranj) in the west: can this be traced through the spread of ‘chess’ haplogroups?

The debate has acquired absurd proportions, reaching a stalemate with both sides sticking to their own arguments, although with the OIT side (see my books and articles) answering every AIT argument, and the AIT side declining to even try to counter the formidable OIT evidence.

Joseph tries to wade into the argument with the presumption that he has all the evidence to settle the debate in favour of AIT once and for all. But before we move to demolish his arguments, it is worth noting what is good about the book. It contains (1) an interesting account of cosmology and biology (pp 1-6); (2) an interesting account of pre-historic human anthropology and genetics (pp 13-60); and (3) an interesting account of the history of agriculture (pp 61-97).

These are good introductions to the subjects concerned. We also have an interesting and very well-written general description of the pre-Harappan and Harappan civilisations (pp 99-132). This includes interesting information about Harappan influence on the Mesopotamian civilisation as exemplified by the depiction of Indian water-buffaloes in the seals of Akkadia.

Joseph also raises a simple point about how the earliest Indian civilisation should be called. He, not unreasonably, settles for the term Harappan civilisation, naming it after the first city that was discovered by archaeologists and historians. He fairly rejects the terms Indus Valley and Indus-Saraswati Civilisation, for the geography of the early Indians went beyond these areas.

But his questionable claims begin fairly early in the book. The crux of the book is stated in the prelude, in which Joseph provides a “A Short Chronology of the Modern Human in Indian Prehistory (pp xi-xiv)”, and the relevant entry in this chronology is the period “2000-1000 BCE” where “Multiple waves of Steppe pastoralist migrants from central Asia (moved) into south Asia, bringing Indo-European languages and new religious and cultural practices” (pp xiv).

The entire story of “our ancestors and where they came from”, for which our Tony wants to provide answers, lies between these dates. The identity (identities) of these migrants between 2000-1000 BCE will give us final answers on the three posers the author provides on the book’s cover: who the Harappans were, whether the Aryans migrated to India, and when caste came to be established.

Joseph’s work can be examined by asking the following questions and examining related issues: one, were the Harappans Dravidian-language speakers?

Two, why is the period 2000-1000 BCE so significant and important?

Three, does the ‘genetic evidence’ tell us that Indo-European language speakers migrated into India from Central Asia?

Four, what secrets do the Old and New Rigveda reveal about the AIT/OIT debate?

Five, what is the real chronological evidence available to us, especially regarding the chronology and geography of the Rigveda?

Six, does the genetic evidence prove the AIT or negate the OIT?

Seven, when did the caste system begin?

Eight, what is the real story of the Saraswati river?

Nine, what does the story of the ‘horse’ in ancient India tell us? We will just touch on some of these issues in this article. A more detailed and comprehensive examination of all the points made in Joseph’s book will be separately published.

The author, after pointing out that the language of the Harappans has not yet been deciphered, still chooses to assert at various points that “historians and archaeologists have so far overwhelmingly backed the idea that the language underlying the Harappan script was Proto-Dravidian”. If a script is not deciphered, to come to this kind of conclusion is preposterous.

Any honest speculation, in the absence of concrete historical evidence indicating otherwise, would assume the language to be the same as, or some earlier form of, the languages found in the area. If strange inscriptions in a new, unknown and yet-to-be-deciphered script, scientifically dated to 1000 BCE, were to be found in some part of Canada, no-one would assume the language to be English or French because we know as a matter of recorded historical fact that English and French arrived in Canada many centuries later.

But if strange inscriptions in a new, unknown and yet-to-be-deciphered script, scientifically dated 5000 BCE, were to be found in some part of Tamil Nadu, China or Saudi Arabia, then any natural assumption, in the absence of concrete historical evidence indicating otherwise, would be that the language couched in the script would be Dravidian, Sinitic or Semitic respectively.

In the case of the area of the Harappan civilisation, there is no historical record of any language other than Indo-Iranian languages ever being spoken that were native to the entire area in the whole of traditional memory and historical record. Further, the Rigveda, the oldest Indo-Aryan text, in the same geographical area, is dated by Western academic scholarly consensus to 1200 BCE at the very latest in a pre-Iron Age era.

So, any kind of evidence for the Harappan language not being Indo-European is non-existent, and any kind of evidence for it being Dravidian is even more non-existent (if one could say such a thing). In these circumstances, only a political-agenda driven motive could produce a conclusion that the Harappans spoke a (Proto)-Dravidian language.

Next, the date 2000-1000 BCE for the ‘Aryan migrations’ into India is the heart and crux of the whole book, and of the whole debate.

It is necessary to understand exactly why this is so. The whole question of ‘Aryans’ or Indo-Europeans and their migrations arose from the discovery that the languages of northern India and Europe, and many areas in between, are related to each other as members of one ‘language-family’ — and the homeland of this common language predecessor is named as the Steppes of southern Russia.

The linguistic evidence suggests that the 12 different, early Indo-European languages (Celtic, Italic, Germanic, Baltic, Slavic, Albanian, Greek, Anatolian, Armenian, Iranian, Tocharian and Indo-Aryan) started separating from each other from the end of the fourth or the beginning of the third millennium (ie around 3000 BCE).

The first to separate was Anatolian (long extinct), while the last five that remained in the ‘homeland’ were Albanian, Hellenic (Greek), Armenian, Iranian and Indo-Aryan, which stayed contiguous and developed new linguistic features. Therefore, if South Russia is the homeland, the Indo-Aryan migrations from there have to have started out long after 3000 BCE.

The Indo-Aryans would have moved in small groups first to central Asia, before finally moving into north-west India to the region, where the Harappan civilisation existed. But historians now mostly agree that the Aryans did not descend on India in any sudden movement right into the heart of Harappan civilisation. Nothing cataclysmic happened in Harappan areas between 4500 BCE and 500 BCE.

On the other hand, scriptural evidence from the Rigveda displays no evidence of a people with any memories of external origins or of any knowledge of areas outside the closest border areas of south-east Afghanistan.

It does not refer to a single name, of friend or foe, in the entire text, or a single other entity of any kind, which can be classified as linguistically Dravidian. The place-names, and even more significantly, the river-names in the entire area, as given in the Rigveda, do not contain a single case, which any linguist can even speculatively classify as Dravidian, even at a point of time between 1400 BCE and 1000 BCE, even as per Joseph’s date for the Rigveda.

This is surprising, since in most civilisations, rivers tend to retain their historic names, even if their areas have been subsequently occupied by new people. In the Vedic period, all river names are derived from Sanskrit or its daughter languages, and nothing remotely pre-Vedic. In short, the entire area seems to be purely Indo-Aryan in the Rigveda itself.

Now this state of affairs would be absolutely impossible in an area of a supposedly Dravidian (or any other non-Indo-European) occupation within a very short period of time after the first Indo-Aryans started ‘trickling in’ after 2000 BCE, without very violent and cataclysmic events taking place. But archaeologists have ruled out any such cataclysmic events.

After the subject of Indo-European language origins and spread sparked a major debate somewhere after 1990, and the weight of evidence in all these three academic disciplines (linguistic, archaeological, textual) shifted completely on the side of the Indian homeland theory and OIT, the AIT lobby in the academic world has literally run away from the debate. The debate has thus been shifted to a new and totally irrelevant field: genetics. But, as we have already noted, genetics does not do anything to prove the linguistic and cultural basis for the AIT.

This brings us to the internal evidence of the Rigveda, which provides a more plausible chronology, and which is supportive of the OIT. The Rigveda is one text with 10 books, but there are really two texts. For the purpose of this article, we will describe them as the Old Rigveda and the New Rigveda. Books 2-7 can be called the ‘family’ books, while books 1, 8, 9 and 10 are ‘non-family’.

The family books are distinguished from the non-family books in two main ways: a) Each family book generally belongs to one family (out of a total of 10 families) of composers, while the non-family books are more mixed and general; and, b) The hymns in the family books are arranged in a specific order: first according to deity (first Agni, then Indra, etc), then within each deity, according to a decreasing number of verses in the hymns (eg 13, 11, 9, 8, etc), then within the same number of verses, according to the metre. The non-family books, however, do not follow this pattern.


But Book 5, though a family book, shares all its other characteristics with the later non-family books rather than with the earlier family books. So, the real old books are 2, 3, 4, 6 and 7, while No 5 can be reckoned with the New Rigveda.

Scholars have also identified some hymns in each of the old books 2-4 and 6-7, which were redacted (modified) or interpolated later, at the time of addition of the new books 1 and 8. These may be called the Redacted Hymns. These consist mainly of old hymns, which were not included in the original books, and therefore underwent changes and modifications. They were inserted into the old books at the time of addition to the Rigveda of the new books 1 and 8. These hymns were inserted in-between the earlier hymns in violation of the pattern of arrangement in the old books.

Thus, we have practically two distinct chronological parts of the Rigveda: (1) An Old Rigveda (Books 2-4, 6-7, minus the Redacted Hymns) with 280 hymns and 2,351 verses; and (2) A New Rigveda (books 1, 5, 8-10) with 686 hymns and 7,311 verses; and (3) The Redacted Hymns (with 62 hymns and 890 verses) form a late appendix to the Old Rigveda — a kind of grey area between the two (Old and New) epochs.

The Old Rigveda and the New Rigveda are very distinct from each other in their characteristics. The primary distinction for Western linguists and Indologists is linguistic: the language of the Old Rigveda is of an older era and harks back to the Proto-Indo-European stage, sharing characteristics and linguistic features (particularly the vocabulary) with the most geographically distant Indo-European (IE) branches.

The language of the New Rigveda shows new innovations (especially, but not only, in vocabulary), sometimes found inserted into the Redacted Hymns, showing the transition to the language of the post-Rigvedic period and later Sanskrit. An example is the word for ‘night’. The Vedic nakt is found in all the other IE languages — the Avestan naxt, German nacht, modern Greek nukhta, Latin nocte, Old Russian noshti, Old Irish nnocht, Albanian nate, Lithuanian naktis, Tocharian nakt, and Hittite nekuz.

A new word, ratri, not found in any other IE branch, appears only in the New Rigveda and in a late redacted hymn in Book 7. Later, this word, ratri, is found hundreds of times in the Atharvaveda (and scores of times even in the Yajurveda) and is the common word in all later Vedic texts, in classical Sanskrit, and in all later and modern Indo-Aryan languages and in other non-Indo-European languages, which have borrowed it from Sanskrit, while the original word nakt is almost unknown.

This division by Western academicians of the Rigveda into an old part and a new part is confirmed by every other criterion:

(1) The composers of the Old Rigveda are ancestral rishis, the composers of the New Rigveda are descendant rishis.

(2) The New Rigveda contains references to ancestral composer rishis of the Old Rigveda and kings contemporaneous to the Old Rigveda, but the same is not the case vice versa.

(3) In the Old Rigveda hymns are composed by descendant rishis in the name of their ancestor rishis, but in the New Rigveda, hymns are generally composed by rishis in their own names. There are other differences, but we shall not elaborate them here.

Now we come to the real chronological evidence. Indian historiography owes a very great historical debt to two entities: To Emperor Ashoka, who left us the first decipherable and datable inscriptional data in Indo-Aryan languages within India in the third century BCE: the Ashoka pillars and inscriptions. Because of this Indo-Aryan language inscriptional data, it is impossible for anyone to claim any kind of evidence to show that the Indo-Aryan languages entered India from Central Asia in 200 BCE.

A second debt is owed to the Mitanni kingdom of Syria-Iraq in West Asia, which left us the first decipherable and datable inscriptional data in Indo-Aryan languages outside India in the sixteenth and fifteenth centuries BCE.

Because of this Indo-Aryan language inscriptional data, it is impossible for anyone to claim any kind of evidence to show that Indo-Aryan languages entered India from Central Asia in 2000 BCE or indeed in any historical period before that.

Because of this Indo-Aryan language inscriptional data, the genetics-based case made out by Joseph’s “92 scientists from around the world”, which places the alleged immigration of certain people from Central Asia into South Asia during the course of “the second millennium BCE (2000 BCE to 1000 BCE)”, seems totally untenable.

What is this data? But first, who were the Mitanni? They were the ruling class or clan of the ‘Mitanni kingdom’, a kingdom which flourished in Syria-Iraq around 1500 BCE onwards for two centuries. In the early twentieth century, plenty of inscriptional material on these Mitanni kings was discovered and analysed, which showed that the Mitanni kings were of Indo-Aryan stock.

Were the inhabitants of the Mitanni kingdom Indo-Aryan-speaking people? On the contrary, they spoke a non-Indo-European language, Hurrian or Hurrite, and the Indo-Aryan data comes from the names and writings of the ruling clan, who were obviously descendants of Vedic Indo-Aryan-speaking or Vedic Aryan-influenced ancestors.

The Indo-Aryan identity of the long-known Mitanni kings was discovered only in the early twentieth century after the decipherment and detailed study of documents and inscriptions in West Asia. The discovery sent the Indological scholarly world into a tizzy: just how was one to explain the presence of Vedic Indo-Aryan speakers in the heart of West Asia at the same time as they were reportedly entering India and before the Rigveda was allegedly composed, and which dates are now sought to be confirmed by genetic evidence?

The Indologists ‘solved’ this problem by declaring that the ‘early Indo-Aryan group’ whose presence in West Asia is so scientifically recorded and dated was a pre-Rigvedic group, which split from the other Indo-Aryans in Central Asia itself, and migrated westwards, so that the Vedic Indo-Aryans, who later composed the Rigveda were entering north-western India at around the same time as the pre-Mitanni Indo-Aryan group was entering West Asia.

This is the crux of the whole debate: are the common Indo-Aryan elements found in the Mitanni data and in the Rigveda ‘pre-Rigvedic’ elements? The Indologists, and the group of “92 scientists from around the world”, conveniently refused to cross-check the data and find out the facts.

However, we will check the data. The word loans in Mitanni, assuming they are part of the pre-Rigvedic era, ought to be found in the Old Rigveda. But this isn’t the case. They are found only in the New Rigveda and are completely absent in the Old Rigveda. Thus, the link of the Mitanni and the Avestan Iranians is with the New Rigveda, and not with the Old Rigveda.

What are the inexorable implications of this evidence for the dating of the Rigveda, on which hinges the entire case built by the genetic evidence.

Here it comes. The common culture shared by the Mitanni and the Rigveda (and the Avesta) is the culture of the New Rigveda. This common culture must obviously have developed in a common territory with the composers of the New Rigveda (and Avesta) before the ‘pre-Mitanni’ ancestors of the ruling clans of the Mitanni kingdom migrated to West Asia.

These pre-Mitanni ancestors of the ruling clans of the Mitanni kingdom migrated to West Asia either: a) from Central Asia (with a parallel migration by the composers of the New Rigveda from Central Asia to the Saptasindhu region of northern Pakistan, as maintained so far by the Indologists and linguists), or b) from the territory of the New Rigveda. And this territory of the New Rigveda, and indeed of the Rigveda as a whole, stretches out from Haryana and western Uttar Pradesh in the east to south-eastern Afghanistan in the west.

The first option is ruled out because the New Rigveda is a linear successor of the culture of the Old Rigveda. The common Vedic-Mitanni-Avestan culture of the New Rigveda (which continues into the other post-Rigvedic Vedic texts, the epics and the Puranas, and later Sanskrit texts) is a new culture completely missing in the Old Rigveda. And the Old Rigveda is also composed in a part of the same territory as the New Rigveda.

This means that the second option is the only possible option: the ‘pre-Mitanni’ ancestors of the ruling clans of the Mitanni kingdom migrated to West Asia from the region stretching out from Haryana and western Uttar Pradesh in the east to southern-eastern Afghanistan in the west.

But where was the Old Rigveda composed? The geographical data shows the composers of the Old Rigveda residing deep inside the eastern part of the Rigvedic territory, in Haryana and western-most Uttar Pradesh, with no knowledge of areas further north and west, and only just commencing their westward expansion (which is actually recorded in precise detail in the form of detailed descriptions of westward expansions of historical kings with names) towards Afghanistan, to occupy the entire Rigvedic area only by the period of the New Rigveda.

The only western geographical names found in the Old Rigveda (and also, of course, in the New Rigveda) are the names of three rivers: the Sindhu and two of its western tributaries, Rasa and Sarayu. All of them are found in only one of the five books of the Old Rigveda (Book 4) and are completely absent in the other four (books 6, 3, 7, 2). These three western river names appear in Book 4 as the last stage in the east-to-west expansion of the Vedic people (the Bharata Puru).

The oldest Book 6 refers only to the Saraswati (which is deified in three whole hymns), and to the rivers east of it: the long bushes on the banks of the Ganga figure in a simile, showing long acquaintance and easy familiarity with the topography and flora of the Ganga area.

The next Book 3 refers to the banks of the Jahnvi (Ganga) as the “ancient homeland” of the gods.

The next oldest Book 4 (but not yet the other old Book 2, whose riverine references are restricted only to the Saraswati) for the first time refers to the Indus (Sindhu) and its western tributaries (Sarayu and Rasa) in clear continuation of the earlier westward movement.

Note that all this is in the Old Rigveda, which precedes the culture of the New Rigveda, which was carried by the Mitanni into West Asia (before they established their kingdom in Syria-Iraq around 1500 BCE).This is the step-by-step factual position regarding the chronology and geography of the Rigveda.

So, clearly, if the “92 scientists from around the world” and their genetic data proves that that in 2000-1000 BCE there were “multiple waves of Steppe pastoralist migrants from central Asia into south Asia” and that they got mixed into the local populations, their DNA and haplogroups forming significant components of every single community in India, whatever else this may allegedly prove, it proves nothing about them “bringing Indo-European languages” to India.

These languages were already present deep inside northern India as far back as 3000 BCE with antecedents going back into the more remote past. These migrants, like all other migrants of later days, simply merged into the local population all over India, adopting the languages and religions (with perhaps some minor contributions of their own).

If Joseph and the AIT proponents want to argue against the OIT, they must contest the data and evidence pertaining to the chronology and geography of the Rigveda. This single factor alone demolishes their case beyond salvation.

A much more detailed examination of Joseph’s book, and of every single aspect of the case presented by him, will be published shortly in a monograph.

Get Swarajya in your inbox everyday. Subscribe here.
Second Term For Modi?
Subscribe now
and get our brand new special print issue.
Use OFFER20 to get flat 20% off on your subscription.

Subscribe Now