Austro-Asiatic language family, agricultural landscape

Austro-Asiatic Language Family

The Austro-Asiatic language family consists of 169 languages spoken in Southeast Asia, in countries located between China and Indonesia. A few are spoken to the west of this area in the Nicobar Islands and in India. The austro– part of the name comes from the Latin word for ‘south.’

It is not known where the speakers of Austro-Asiatic languages came from or when they migrated to this part of the world. It is generally conjectured that they originated in southern or southeastern China some time between 2,000-2,500 BC, and migrated south into the Indo-Chinese peninsula and west into India. Invasions by speakers of other languages split the Austro-Asiatic languages into several groups. The date of separation of the two main Austroasiatic subfamilies—Mon-Khmer and Munda—has never been definitively established and must have occurred in prehistoric times.

mapBecause of their separation from each other and because they were surrounded by other languages, Austro-Asiatic languages exhibit great diversity. For instance, The Munda branch has been influenced by synthetic, non-tonal Indo-Aryan languages, while the Mon-Khmer branch was influenced by analytic, tonal languages of China. As a result, the two branches have evolved in different directions which makes the reconstruction of their common ancestor extremely difficult.

Ethnologue divides the Austro-Asiatic language family into two main branches:

  • Mon-Khmer (147 languages)
    The Mon-Khmer languages are indigenous to Indo-China. For more than two millennia, these languages were the lingua francas of Southeast Asia. They are still spoken across China, Vietnam, Cambodia, Laos, Thailand, Burma, Malaysia, and India. The most significant Mon-Khmer languages are Khmer, with 7 million speakers, and Vietnamese, with 68 million speakers.
  • Munda (22 languages)
    The Munda languages are spoken by about 9 million people in the hilly and forested regions of eastern India and Bangladesh. Their origins are not known, though it is generally thought that, with a few exceptions, they are indigenous to eastern India. The most significant Munda languages are Santali, with close to 6 million speakers; Ho, with over 1 million speakers; Mundari, with over 2 million speakers; and Korku, with close to 500,000 speakers.

Many Austro-Asiatic languages are found only in isolated communities and are highly endangered. Only 24 (14%) of the 169 languages have over 50,000 speakers, and only 3 of them over 1 million speakers.

Number of speakers
50,000- 100,000
100,000- 200,000

200,000- 300,000

300,000- 1 million
Over 1 million
Khmer (7 million)
Vietnamese (68 million)



Vietnamese and Khmer have official status in Vietnam and Cambodia, respectively. The rest of the languages are spoken by minority groups. Both Vietnamese and Khmer are taught in schools and used in all aspects of personal and public life. Speakers of most other Austro-Asiatic languages are under social, political, and economic pressure to become bilingual in the official languages of the country in which they live. Most groups are too small or too scattered to gain official recognition.


Most Austro-Asiatic languages languages have numerous dialects. In addition, due to lack of adequate descriptions, there are many languages whose status as independent entities as opposed to being dialects of one language, has not been established.


Sound system

Austro-Asiatic languages do not have tones, but have a large variety of vowels. The sound systems of Mon-Khmer and Munda branches have diverged considerably under the influence of Chinese and Indo-Aryan languages respectively. Nevertheless, they share some common features:

  • Most words consist of a major syllable optionally preceded by one or more minor syllables.
  • Minor syllables have one consonant, one minor vowel, and an optional final consonant.
  • There is a great variety of two-consonant clusters at the beginning of major syllables with fewer possible consonants at the end..
  • Most Austroasiatic languages have palatal consonants /c/ or /or /ɲ/ at the end of words.
  • Many languages distinguish between vowels pronounced with different voice qualities such as “breathy,” “creaky,” or clear.
  • Several Mon-Khmer languages have implosive stops /ɓ/ and /ɗ/ at the beginning of major syllables. Implosive consonants are pronounced with the air sucked inward rather than breathed out.
  • Several Mon-Khmer languages have aspirated stops /pʰ, tʰ, cʰ, kʰ/.


There are significant differences between the the grammatical features of Mon-Khmer and Munda languages.


Mon-Khmer languages share the following grammatical features:

  • Most have no suffixes, but infixes and prefixes are quite common.
  • The same infix or prefix can express different functions, depending on the noun or verb class to which it is attached.
  • There is a large number of onomatopoeic and other expressive words that refer to sounds, colors, sensations, emotions, etc.
  • Ergative constructions are quite common.
  • The normal word order is Subject – Verb – Object..



Munda languages were heavily influenced by the surrounding Indo-Aryan languages. They differ from Mon-Khmer languages in complexity of morphology.

  • Nouns are inflected for gender (animate and inanimate), and number (singular, dual, and plural).
  • Personal pronouns distinguish between exclusive and inclusive 1st person plural.
  • Verbs agree with their subjects and are inflected for a number of categories, including person, number, tense, negation, and mood.
  • The basic word order is Subject – Object – Verb, typical of Indo-Aryan and Dravidian languages of India.


The lexical stock of Austro-Asiatic languages reflects their history of contact with other languages and civilizations. For instance, Vietnamese has borrowed extensively from Chinese, while Khmer has many loanwords from Sanskrit and Pali. In addition, Austro-Asiatic languages have borrowed from their nearest majority languages.

Below are a few words and phrases in Khmer and Vietnamese.

Hello cum riep sue (greeting to a man); baat cum riep sue (response by a man)
cum riep sue look qum
(geeting to an older person); baat cum riep sue(response by an older person)
cum riep sue
(greeting to a woman0; caa cum riep sue (response by a woman)
xin chào
Good bye joom-reap leah tiếng chào nhau,
Thank you qaa-kun lời cám ơn người nào
Yes baat dạ, vâng
No deh không


Below are numerals 1-10 in Khmer and Vietnamese.

pram muəj
pram pii:
pram bəi
pram buən



Khmer and Mon

The Old Mon and Old Khmer scripts were the earliest writing systems of Southeast Asia.They are attested in a number of official inscriptions on monuments in Myanmar (Burma), Cambodia, and Thailand that date back to the 7th century AD. The two scripts were based on the Pallava script used by the Tamil people of South India which, in turn, descended from the Brahmi script of India. Both scripts were modified to suit the phonology of their languages. Eventually, other people used the two scripts as a basis for their own writing systems. Thus, Thai speakers use Khmer letters, and Burmese speakers use Mon letters.

  • Khmer script
    The Khmer script consists of thirty-three consonants, twenty-four dependent vowels, twelve independent vowels, and several diacritics. It is a syllabic alphabet in which each consonant has two forms, one with an inherent vowel /a/ (first series) and one with an inherent vowel /o/ (second series). Vowels are indicated by using either separate letters or diacritics written above, below, in front of, after, or around the consonants. The pronunciation of the vowels depends on whether a consonant they are attached to belongs to the first or the second series. All consonants have a subscript form that is used to write the second consonant of a cluster. There are no spaces between words. Spaces are used to indicate the end of a clause or sentence.


  • Mon script
    The Mon script is used to write Mon, the Karen languages and Burmese, all spoken in Myanmar (Burma), although Burmese and the Karen languages belong to the Tibeto-Burmese branch of the Sino-Tibetan language family. The Mon script was derived from the Brahmi script of India. The basic unit of the script is a consonant-based syllable with an inherent /a/ vowel which is suppressed by a circular stroke above the character. It is written horizontally from left to right and its basic set of symbols consists of 33 consonants and 14 vowels. Symbols for vowels may be written before, above, below, or to the right of the letter representing an initial consonant. The combinations of consonants and diacritic vowels are often represented by special ligatures. Spaces are used to separate phrases, not words. A single vertical bar marks a small break, a double vertical bar marks the end of a sentence. The alphabet consists almost entirely of circles or portions of circles used in various combinations because it evolved at a time when letters were etched on palm leaves with a stylus. Straight lines would have torn the leaves. Because of its rounded appearance, it resembles the Indic scripts.


  • Vietnamese alphabet
    Since Vietnam was a Chinese province for a thousand years, all official writing was done in Chinese. In the 8th century AD, a modified Chinese orthography was devised. In the mid-17th century, Portuguese missionaries devised a Latin-based alphabet for writing Vietnamese, that included additional letters and diacritics to mark tones. Originally, this script was used for religious purposes, but it eventually spread to other contexts, and in 1910, it became the official script of the French colonial administration. It is used by all Vietnamese today.


  • Other Austro-Asiatic languages
    Most other Austroasiatic languages have remained unwritten until the 20th century. With a few exceptions, literacy rates in these languages are very low, most lack grammars and dictionaries, and many have yet to be described.



Language Difficulty
questionHow difficult is it to learn Austro-Asiatic languages?
Data is available only for Khmer and Vietnamese, both of which are considered to be somewhat more difficult than other Category II languages for speakers of English.