Indo-European Language Family

Indo-European is a family of languages that first spread throughout Europe and many parts of South Asia, and later to every corner of the globe as a result of colonization. The term Indo-European is essentially geographical since it refers to the easternmost extension of the family from the Indian subcontinent to its westernmost reach in Europe. The family includes most of the languages of Europe, as well as many languages of Southwest, Central and South Asia. With over 2.6 billion speakers (or 45% of the world’s population), the Indo-European language family has the largest number of speakers of all language families as well as the widest dispersion around the world.

The cradle of the Indo-Europeans may never be known but an ongoing scholarly debate about the original homeland of Proto-Indo-European (PIE), may some day shed light on the ancestors of all Indo-European languages as well as the people who spoken it. There are two schools of thought:

Some scholars (e.g., Marija Gimbutas) propose that PIE originated in the steppes north of the Black and Caspian Seas (the Kurgan hypothesis). Kurgan is the Russian word of Turkic origin for a type of burial mound over a burial chamber. The Kurgan hypothesis combines archaeology with linguistics to trace the diffusion of kurgans from the steppes into southeastern Europe, providing support for the existence ot a Kurgan culture that reflected an early presence of Indo-European people in the steppes and southeastern Europe from the 5th to the 3rd millenium BC.
Other scholars (e.g., Gamkrelidze and Ivanov) suggest that PIE originated around 7,000 BC in Anatolia, a stretch of land that lies between the Black and Mediterranean seas. It lies across the Aegean Sea to the east of Greece and is thus usually known by its Greek name Anatolia (Asia Minor). Today, Anatolia is the Asian part of modern Turkey.

It would not have been possible to establish the existence of the Indo-European language family if scholars had not compared the systematically recurring resemblances among European languages and Sanskrit, the oldest language of the Indian subcontinent that left many written documents. The common origin of European languages and Sanskrit was first proposed by Sir William Jones(1746-1794). Systematic comparisons between these languages by Franz Bopp supported this theory and laid the foundation for postulating that all Indo-European languages descended from a common ancestor, Proto-Indo-European (PIE), thought to have been spoken before 3,000 B.C. It then split into different branches which, in turn, split into different languages in the subsequent millennia.

Since PIE left no written records, historical linguists construct family trees, an idea pioneered by August Schleicher, on the basis of the comparative method. The comparative method takes shared features among languages and uses procedures to establish their common ancestry. It is not the only method available but is one that has been most widely used. The examples below show how this method actually works with some Indo-European languages.

PIE *dekm >

Proto-Germanic *texun > Old English teon > Modern English ten
Proto-Italic *dekem > Latin decem > Modern Italian dieci
Old Church Slavonic desenti > Modern Bulgarian deset
Sanskrit dáça > Hindi/Urdu das
Greek deka

proto means ‘old’ in Greek
* means the form was reconstructed, not attested.
> means ‘became’

Indo-European languages are classified into 11 major groups, 2 of which are extinct, comprising 449 languages (Ethnologue).

Baltic
This conservative group has preserved many archaic features thought to have been present in PIE. Some scholars think that Baltic languages share a common ancestral language with the Slavic languages. This hypothetical language is called Balto-Slavic.

Language	Number of speakers	Where spoken primarily
Latvian	1.5 million	Latvia
Lithuanian	3.1million	Lithuania

Celtic
Celtic languages were largely unknown until the modern period. They were once spread over Europe in the pre-Christian era. The oldest records of these languages date back to the 4th century AD.

Language	Number of speakers	Where spoken primarily
Breton	533,000.	France
Irish	355,000	Ireland
Scottish (Scots Gaelic)	62,175.	Scotland
Welsh	575,000	Wales

Germanic

West Germanic
Language	Number of speakers	Where spoken primarily
Afrikaans	6 million	South Africa
Dutch	17 million	Holland
English	309 million	UK, US, Australia, Canada
German	95 million	Germany
Yiddish	50,000	Germany, Israel
North Germanic
Language	Number of speakers	Where spoken primarily
Danish	5.3million	Denmark
Icelandic	240,000	Iceland
Norwegian	4.6 million	Norway
Swedish	8.8 million	Sweden

Romance (Italic)

Language	Number of speakers	Where spoken primarily
Catalan	6.7 million	Spain
French	65 million	France
Italian	61.5 million	Italy
Portuguese	178 million	Portugal, Brazil
Romanian	23.5 million	Romania
Spanish	322 million	Spain, Latin America

Slavic

West Slavic
Language	Number of speakers	Where spoken primarily
Czech	11.5 million	Czech Republic
Polish	43 million	Poland
Slovak	5 million	Slovakia
Sorbian	70,000 to 110,000	Germany

East Slavic
Language	Number of speakers	Where spoken primarily
Belarusian	9 million	Belorusia
Russian	150 million L1 speakers	Russia
Ukrainian	37.1 million	Ukraine

South Slavic
Language	Number of speakers	Where spoken primarily
Bosnian	4 million	Bosnia & Hercegovina
Croatian	6.2 million	Croatia
Macedonian	1.6 million	Macedonia
Serbian	11.1 million	Serbia
Slovenian	2 million	Slovenia

Indo-Iranian

Indo-Aryan (Indic)
Language	Number of speakers	Where spoken primarily
Balochi	1.8 million	Pakistan
Bengali	100 million 1st language; 211 million 1st & 2nd language speakers	Bangladesh
Bhojpuri	26.6 million	India
Hindi	180.8 million	India
Gujarati	46.1 million	India
Kashmiri	4.6 million	India
Marathi	68 million	India
Nepali	17.2 million	Nepal
Maithili	24.8 million	India
Oriya	31.7 million	India
Punjabi	60.8 million	India
Romani	1.5 million	Romania & elsewhere
Sanskrit	194,000 2nd language speakers	India & elsewhere
Sindhi	21.3 million	Pakistan
Sinhalese	13.2 million	Sri Lanka
Urdu	60.5 million	Pakistan

Iranian
Language	Number of speakers	Where spoken primarily
Dari	7.6 million	Afghanistan
Farsi (Persian)	24.3 million	Iran
Kurdish	11 million	Iraq & elsewhere
Pashto	19 million	Afghanistan & elsewhere
Tajik	4.3 million	Tajikistan

Language	Number of speakers	Where spoken primarily
Albanian	5 million	Albania
Armenian	6.7 million	Armenia
Hellenic Greek is the only surviving language of this group.	12.3 million	Greece

Tocharian (extinct) Attested by texts dating to 500-1000 AD that were found in early 20th century in Chinese Turkestan
Anatolian (extinct) Unknown until the 20th century when it was discovered during excavations in Turkey. Texts written in cuneiform date to 13th-7th centuries BC.

In addition to these main groups, there are fragmentary records of other Indo-European languages. These records, mostly in the form of inscriptions, do not provide sufficient material for the reconstruction of PIE.

Dialects

Top

Structure

Top

Sound system
There have been numerous attempts to reconstruct the vowels and consonants of PIE, all of which encountered serious problems due to the uneven nature of the written records and to the huge differences in the age of the records. As a result, the reconstruction of PIE phonology continues to be a matter of scholarly debate and speculation. Among the most notable reconstructions are those by August Schleicher, Karl Brugmann, Winfred Lehmann, Oswald Szemerènyi, and Jacob Grimm.

First Germanic Sound Shift (Grimm’s Law)
You probably know of Jacob Grimm as the author of fairy tales. But he was also one of the great linguists of the 19th century. He found evidence for the unity of all the modern Germanic languages in the phenomenon known as the First Germanic Sound Shift (also known as Grimm’s law ), which set the Germanic branch apart from the other branches of the Indo-European family. This shift occurred before the 7th century when records started to be kept. According to Grimm’s law, the shift occurred when /p, t, k/ in the classical Indo-European languages (Latin, Greek, and Sanskrit) became /f, t, h/ in Germanic languages. For example, Latin pater > English father, Latin cornu > English horn.
You can easily see the resemblances among four common words across five Indo-European languages.

English	Greek	Latin	Sanskrit
father	pater	pater	pita
brother	phrater	frater	bhratar
foot	poda	pedem	pada
three	tris	tres	trí

Click here for an amusing illustration of Grimm’s Law and of words for family, plants, animals, sky, and counting in nine Indo-European languages.

Centum-Satem division
The Centum-Satem division explains the evolution of PIE labiovelar, velars, and palatovelar consonants.

Labiovelar consonants include [kw, gw, xw, ngw] which are pronounced like [k, g, x, ng] but with rounded lips.
Velars are consonants articulated with the back part of the tongue (the dorsum) against the soft palate (the back part of the roof of the mouth, known also as the velum). They include [k, g, x, ng].
Palatovelar consonants are articulated with the back part of the tongue against the hard palate. They include [k’, g’, x’, ng’]. For example, [k’] is pronounced as the k in keen.

The terms centum-satem come from the words for ‘one hundred’ in representative languages of each group. Please note that not all languages fall neatly into these categories.

Satem languages include Baltic, Slavic, Albanian, Armenian, and Indo-Iranian languages. For example, Sanskrit satam, Lithuanian simtas, Russian sto.
Click here to see the complete satem language tree.
Centum languages include Romance, Celtic, Germanic, and Greek. For example, Latin centum, Irish cead, English hundred, Greek.

Click here to see the complete Centum language tree.

Stress
It is believed that PIE had a pitch accent system. All words had only one accented syllable which received a high pitch. Stress could fall on any syllable of a word.

Grammar
Unevenness of existing records and huge gaps in the chronology among Indo-European languages make the reconstruction of PIE grammar a difficult task. Discoveries of Hittite, Tocharian and Mycenaean Greek in the 20th century have made changes in the data base on which the reconstruction of PIE is based that in turn have modified existing views of PIE. .

Many of the older well-documented languages, such as Sanskrit, Greek, and Latin, have rich morphologies with clearly marked gender and number, as well as elaborately marked case systems for nouns, pronouns, and adjectives. Verbs in these languages also have elaborately marked systems of tense, aspect, mood, and voice, in addition to person, number, and gender. Reconstructed PIE is based on the assumption that it contained all the features found in attested languages. If a given language lacks a particular feature, it is assumed that the feature was lost or that it had merged with other features.

Modern Indo-European languages reflect the rich morphology of PIE to various degrees. For instance, Sanskrit, Greek, Latin, Baltic, Slavic, Celtic, Armenian have extremely rich morphologies. On the other hand, Germanic, Romance, Albanian, and Tocharian do not possess quite as many finely differentiated morphological features.

Nouns, pronouns and adjectives

Case
Sanskrit had the most cases (8), followed by Old Church Slavonic, Lithuanian, and Old Armenian (7), Latin (6), Greek, Old Irish, Albanian (5), Germanic (5).
Gender
The three genders (masculine, feminine, neuter) have survived in a number of Indo-European languages.
Number
The three numbers (singular, dual, plural) survived in Sanskrit, Greek, and Old Irish. Vestiges of the dual number can be found in many other Indo-European languages.
Adjective-Noun agreement
Adjective-noun agreement has survived in many Indo-European languages.

Verbs
Reconstructed PIE verbs had different sets of endings tense/aspect, voice and mood in addition to person and number. :

Tense and aspect
It is thought that the PIE verb system was aspect-based, although traditionally, aspect has been confused with tense. Although tense was not formally marked in PIE, most Indo-European languages define their verbal systems in terms of tense, rather than aspect. .
Voice
PIE had two voices: active (e.g., The child broke the glass) and medio-passive which combined reflexive and passive voices (e.g., The child washed himself and The child was washed by his mother). In addition to the active voice, various Indo-European languages use the middle or the passive voices.
Mood
It is hypothesized the PIE had four moods: indicative, optative, subjunctive, and imperative. Most of these moods exist in all Indo-European languages.
Person and number
PIE verbs were marked for person (1st, 2nd, 3rd) and number (singular, dual, plural).

Word order
Less is know about the syntax of PIE than about its morphology. What is known about PIE word order, therefore, is a subject of conjecture and debate. It is thought likely that word order in PIE sentences was Subject-Object-Verb. This word order is found in Latin, Hittite, Vedic Sanskrit, Tocharian, and to some extent in Greek.

Vocabulary
The comparative method enables linguists to reconstruct a basic PIE vocabulary referring to many common elements of their culture. This basic vocabulary is not uniformly attested across all Indo-European languages which suggests that some words may have developed later or were borrowed from other languages. Among words that are reliably reconstructed are words for day, night, the seasons, celestial bodies (sun, moon, stars), precipitation (rain, snow), animals (sheep, horse, pig, bear, dog, wolf, eagle), kinship terms (father, mother, brother, sister, son, daughter), tools (axe, yoke, arrow).

Click here to explore cognates in different Indo-European languages

Writing

Top

Written records for various Indo-European languages have different date lines. The table below shows when the first written records appeared, what writing system was used, and which writing systems are used by the languages today.

Branches	Earliest written records	Earliest writing system	Current writing system(s)
Armenian	500 AD	Armenian alphabet	Armenian alphabet
Albanian	15th century AD	Greek alphabet	Modified Latin alphabet
Greek	1,400 BC	Linear B	Greek alphabet
Celtic	4th century AD	Ogham alphabet	Modified Latin alphabet
Baltic	16 th century AD	Modified Latin alphabet	Modified Latin alphabet
Romance	6th century BC	Latin alphabet, adapted from Etruscan	Modified Latin alphabet
Germanic	3rd century AD	runic Futhark	Modified Latin alphabet
Slavic	9th century AD	Old Church Slavonic alphabet	Cyrillic and Latin alphabets
Indo-Aryan	3rd century BC	Br�?hmī script	Bengali, Devan�?garī, Gujarati, Oriya, Gurmukhi, Sinhala, Kaithi,modified Perso-Arabic
Iranian	9th century AD	Perso-Arabic script	Modified Perso-Arabic, Arabic, modified Cyrillic, modified Latin.
Tocharian	500-1,000 AD	Br�?hmī script

Difficulty

Top

Language Difficulty

How difficult is it to learn Indo-European Languages?
Indo European Languages range from Category I to Category II in terms of difficulty for speakers of English.

Top