Indo-European language family, Bruges

Indo-European Language Family

Indo-European is a family of languages that first spread throughout Europe and many parts of South Asia, and later to every corner of the globe as a result of colonization. The term Indo-European is essentially geographical since it refers to the easternmost extension of the family from the Indian subcontinent to its westernmost reach in Europe. The family includes most of the languages of Europe, as well as many languages of Southwest, Central and South Asia. With over 2.6 billion speakers (or 45% of the world’s population), the Indo-European language family has the largest number of speakers of all language families as well as the widest dispersion around the world.

The cradle of the Indo-Europeans may never be known but an ongoing scholarly debate about the original homeland of Proto-Indo-European (PIE), may some day shed light on the ancestors of all Indo-European languages as well as the people who spoken it. There are two schools of thought:

  • Some scholars (e.g., Marija Gimbutas) propose that PIE originated in the steppes north of the Black and Caspian Seas (the Kurgan hypothesis). Kurgan is the Russian word of Turkic origin for a type of burial mound over a burial chamber. The Kurgan hypothesis combines archaeology with linguistics to trace the diffusion of kurgans from the steppes into southeastern Europe, providing support for the existence ot a Kurgan culture that reflected an early presence of Indo-European people in the steppes and southeastern Europe from the 5th to the 3rd millenium BC.
  • Other scholars (e.g., Gamkrelidze and Ivanov) suggest that PIE originated around 7,000 BC in Anatolia, a stretch of land that lies between the Black and Mediterranean seas. It lies across the Aegean Sea to the east of Greece and is thus usually known by its Greek name Anatolia (Asia Minor). Today, Anatolia is the Asian part of modern Turkey.

World mapIt would not have been possible to establish the existence of the Indo-European language family if scholars had not compared the systematically recurring resemblances among European languages and Sanskrit, the oldest language of the Indian subcontinent that left many written documents. The common origin of European languages and Sanskrit was first proposed by Sir William Jones(1746-1794). Systematic comparisons between these languages by Franz Bopp supported this theory and laid the foundation for postulating that all Indo-European languages descended from a common ancestor, Proto-Indo-European (PIE), thought to have been spoken before 3,000 B.C. It then split into different branches which, in turn, split into different languages in the subsequent millennia.

Since PIE left no written records, historical linguists construct family trees, an idea pioneered by August Schleicher, on the basis of the comparative method. The comparative method takes shared features among languages and uses procedures to establish their common ancestry. It is not the only method available but is one that has been most widely used. The examples below show how this method actually works with some Indo-European languages.

PIE *dekm > Proto-Germanic *texun > Old English teon > Modern English ten
Proto-Italic *dekem > Latin decem > Modern Italian dieci
Old Church Slavonic desenti > Modern Bulgarian deset
Sanskrit dáça > Hindi/Urdu das
Greek deka
  • proto means ‘old’ in Greek
  • * means the form was reconstructed, not attested.
  • > means ‘became’


Indo-European languages are classified into 11 major groups, 2 of which are extinct, comprising 449 languages (Ethnologue).

This conservative group has preserved many archaic features thought to have been present in PIE. Some scholars think that Baltic languages share a common ancestral language with the Slavic languages. This hypothetical language is called Balto-Slavic.

Number of speakers
Where spoken primarily
Latvian 1.5 million Latvia
Lithuanian 3.1million Lithuania

Celtic languages were largely unknown until the modern period. They were once spread over Europe in the pre-Christian era. The oldest records of these languages date back to the 4th century AD.

Number of speakers
Where spoken primarily
Breton 533,000. France
Irish 355,000 Ireland
Scottish (Scots Gaelic) 62,175. Scotland
Welsh 575,000 Wales


West Germanic
Number of speakers
Where spoken primarily
Afrikaans 6 million South Africa
Dutch 17 million Holland
English 309 million UK, US, Australia, Canada
German 95 million Germany
Yiddish 50,000 Germany, Israel
North Germanic
Number of speakers
Where spoken primarily
Danish 5.3million Denmark
Icelandic 240,000 Iceland
Norwegian 4.6 million Norway
Swedish 8.8 million Sweden

Romance (Italic)

Number of speakers
Where spoken primarily
Catalan 6.7 million Spain
French 65 million France
Italian 61.5 million Italy
Portuguese 178 million Portugal, Brazil
Romanian 23.5 million Romania
Spanish 322 million Spain, Latin America


West Slavic
Number of speakers
Where spoken primarily
Czech 11.5 million Czech Republic
Polish 43 million Poland
Slovak 5 million Slovakia
Sorbian 70,000 to 110,000 Germany


East Slavic
Number of speakers
Where spoken primarily
Belarusian 9 million Belorusia
Russian 150 million L1 speakers Russia
Ukrainian 37.1 million Ukraine


South Slavic
Number of speakers
Where spoken primarily
Bosnian 4 million Bosnia & Hercegovina
Croatian 6.2 million Croatia
Macedonian 1.6 million Macedonia
Serbian 11.1 million Serbia
Slovenian 2 million Slovenia


Indo-Aryan (Indic)
Number of speakers
Where spoken primarily
Balochi 1.8 million Pakistan
Bengali 100 million 1st language; 211 million 1st & 2nd language speakers Bangladesh
Bhojpuri 26.6 million India
Hindi 180.8 million India
Gujarati 46.1 million India
Kashmiri 4.6 million India
Marathi 68 million India
Nepali 17.2 million Nepal
Maithili 24.8 million India
Oriya 31.7 million India
Punjabi 60.8 million India
Romani 1.5 million Romania & elsewhere
Sanskrit 194,000 2nd language speakers India & elsewhere
Sindhi 21.3 million Pakistan
Sinhalese 13.2 million Sri Lanka
Urdu 60.5 million Pakistan


Number of speakers
Where spoken primarily
Dari 7.6 million Afghanistan
Farsi (Persian) 24.3 million Iran
Kurdish 11 million Iraq & elsewhere
Pashto 19 million Afghanistan & elsewhere
Tajik 4.3 million Tajikistan


Number of speakers
Where spoken primarily
Albanian 5 million Albania
Armenian 6.7 million Armenia
Greek is the only surviving language of this group.
12.3 million Greece


Tocharian (extinct)
Attested by texts dating to 500-1000 AD that were found in early 20th century in Chinese Turkestan
Anatolian (extinct)
Unknown until the 20th century when it was discovered during excavations in Turkey. Texts written in cuneiform date to 13th-7th centuries BC.

In addition to these main groups, there are fragmentary records of other Indo-European languages. These records, mostly in the form of inscriptions, do not provide sufficient material for the reconstruction of PIE.



Sound system
There have been numerous attempts to reconstruct the vowels and consonants of PIE, all of which encountered serious problems due to the uneven nature of the written records and to the huge differences in the age of the records. As a result, the reconstruction of PIE phonology continues to be a matter of scholarly debate and speculation. Among the most notable reconstructions are those by August Schleicher, Karl Brugmann, Winfred Lehmann, Oswald Szemerènyi, and Jacob Grimm.

First Germanic Sound Shift (Grimm’s Law)
You probably know of Jacob Grimm as the author of fairy tales. But he was also one of the great linguists of the 19th century. He found evidence for the unity of all the modern Germanic languages in the phenomenon known as the First Germanic Sound Shift (also known as Grimm’s law ), which set the Germanic branch apart from the other branches of the Indo-European family. This shift occurred before the 7th century when records started to be kept. According to Grimm’s law, the shift occurred when /p, t, k/ in the classical Indo-European languages (Latin, Greek, and Sanskrit) became /f, t, h/ in Germanic languages. For example, Latin pater > English father, Latin cornu > English horn.
You can easily see the resemblances among four common words across five Indo-European languages.

father pater pater pita
brother phrater frater bhratar
foot poda pedem pada
three tris tres trí

Click here for an amusing illustration of Grimm’s Law and of words for family, plants, animals, sky, and counting in nine Indo-European languages.

Centum-Satem division
The Centum-Satem division explains the evolution of PIE labiovelar, velars, and palatovelar consonants.

  • Labiovelar consonants include [kw, gw, xw, ngw] which are pronounced like [k, g, x, ng] but with rounded lips.
  • Velars are consonants articulated with the back part of the tongue (the dorsum) against the soft palate (the back part of the roof of the mouth, known also as the velum). They include [k, g, x, ng].
  • Palatovelar consonants are articulated with the back part of the tongue against the hard palate. They include [k’, g’, x’, ng’]. For example, [k’] is pronounced as the k in keen.

The terms centum-satem come from the words for ‘one hundred’ in representative languages of each group. Please note that not all languages fall neatly into these categories.

  • Satem languages include Baltic, Slavic, Albanian, Armenian, and Indo-Iranian languages. For example, Sanskrit satam, Lithuanian simtas, Russian sto.
    Click here to see the complete satem language tree.
  • Centum languages include Romance, Celtic, Germanic, and Greek. For example, Latin centum, Irish cead, English hundred, Greek.


Click here to see the complete Centum language tree.

It is believed that PIE had a pitch accent system. All words had only one accented syllable which received a high pitch. Stress could fall on any syllable of a word.

Unevenness of existing records and huge gaps in the chronology among Indo-European languages make the reconstruction of PIE grammar a difficult task. Discoveries of Hittite, Tocharian and Mycenaean Greek in the 20th century have made changes in the data base on which the reconstruction of PIE is based that in turn have modified existing views of PIE. .

Many of the older well-documented languages, such as Sanskrit, Greek, and Latin, have rich morphologies with clearly marked gender and number, as well as elaborately marked case systems for nouns, pronouns, and adjectives. Verbs in these languages also have elaborately marked systems of tense, aspect, mood, and voice, in addition to person, number, and gender. Reconstructed PIE is based on the assumption that it contained all the features found in attested languages. If a given language lacks a particular feature, it is assumed that the feature was lost or that it had merged with other features.

Modern Indo-European languages reflect the rich morphology of PIE to various degrees. For instance, Sanskrit, Greek, Latin, Baltic, Slavic, Celtic, Armenian have extremely rich morphologies. On the other hand, Germanic, Romance, Albanian, and Tocharian do not possess quite as many finely differentiated morphological features.

Nouns, pronouns and adjectives

  • Case
    Sanskrit had the most cases (8), followed by Old Church Slavonic, Lithuanian, and Old Armenian (7), Latin (6), Greek, Old Irish, Albanian (5), Germanic (5).
  • Gender
    The three genders (masculine, feminine, neuter) have survived in a number of Indo-European languages.
  • Number
    The three numbers (singular, dual, plural) survived in Sanskrit, Greek, and Old Irish. Vestiges of the dual number can be found in many other Indo-European languages.
  • Adjective-Noun agreement
    Adjective-noun agreement has survived in many Indo-European languages.

Reconstructed PIE verbs had different sets of endings tense/aspect, voice and mood in addition to person and number. :

  • Tense and aspect
    It is thought that the PIE verb system was aspect-based, although traditionally, aspect has been confused with tense. Although tense was not formally marked in PIE, most Indo-European languages define their verbal systems in terms of tense, rather than aspect. .
  • Voice
    PIE had two voices: active (e.g., The child broke the glass) and medio-passive which combined reflexive and passive voices (e.g., The child washed himself and The child was washed by his mother). In addition to the active voice, various Indo-European languages use the middle or the passive voices.
  • Mood
    It is hypothesized the PIE had four moods: indicative, optative, subjunctive, and imperative. Most of these moods exist in all Indo-European languages.
  • Person and number
    PIE verbs were marked for person (1st, 2nd, 3rd) and number (singular, dual, plural).

Word order
Less is know about the syntax of PIE than about its morphology. What is known about PIE word order, therefore, is a subject of conjecture and debate. It is thought likely that word order in PIE sentences was Subject-Object-Verb. This word order is found in Latin, Hittite, Vedic Sanskrit, Tocharian, and to some extent in Greek.

The comparative method enables linguists to reconstruct a basic PIE vocabulary referring to many common elements of their culture. This basic vocabulary is not uniformly attested across all Indo-European languages which suggests that some words may have developed later or were borrowed from other languages. Among words that are reliably reconstructed are words for day, night, the seasons, celestial bodies (sun, moon, stars), precipitation (rain, snow), animals (sheep, horse, pig, bear, dog, wolf, eagle), kinship terms (father, mother, brother, sister, son, daughter), tools (axe, yoke, arrow).

Click here to explore cognates in different Indo-European languages


Written records for various Indo-European languages have different date lines. The table below shows when the first written records appeared, what writing system was used, and which writing systems are used by the languages today.

Earliest written records
Earliest writing system
Current writing system(s)
Armenian 500 AD Armenian alphabet Armenian alphabet
Albanian 15th century AD Greek alphabet Modified Latin alphabet
Greek 1,400 BC Linear B Greek alphabet
Celtic 4th century AD Ogham alphabet Modified Latin alphabet
Baltic 16 th century AD Modified Latin alphabet Modified Latin alphabet
Romance 6th century BC Latin alphabet, adapted from Etruscan Modified Latin alphabet
Germanic 3rd century AD runic Futhark Modified Latin alphabet
Slavic 9th century AD Old Church Slavonic alphabet Cyrillic and Latin alphabets
Indo-Aryan 3rd century BC Br�?hmī script Bengali, Devan�?garī, Gujarati, Oriya, Gurmukhi, Sinhala, Kaithi,modified Perso-Arabic
Iranian 9th century AD Perso-Arabic script Modified Perso-Arabic, Arabic, modified Cyrillic, modified Latin.
Tocharian 500-1,000 AD Br�?hmī script


Language Difficulty
questionHow difficult is it to learn Indo-European Languages?
Indo European Languages range from Category I to Category II in terms of difficulty for speakers of English.