Learning Arabic, Persian, and Urdu

I’m writing this to share my impressions on learning Arabic, Persian, and Urdu; I’ve studied all three languages in some capacity. Comparing the learning process of all three languages is a question that often pops up on language learning websites and forums. Interestingly, the three languages are not closely related to each other, and the reason they are conflated together is more due to culture than linguistics. Although I am not from an Islamic background, I find the study of this civilization interesting and like to see the connections and borrowings between its various component cultures and languages.

(To clarify: I am the child of immigrants to the United States from India and English is my native language, while my parents’ native language is Telugu, a Dravidian language from South India. But other than English, the single language I was most exposed to growing up was Hindi, due to Hindi (Bollywood) movies and music, which were almost ubiquitous at my family house. I was familiar with spoken and written Hindi before starting Urdu, and then moving on to Persian and Arabic).

Persian and Urdu are both Indo-European languages of the Indo-Iranian branch of that family, but belong to different subbranches that diverged over 3,000 years ago: Persian belongs to the Iranian branch, and Urdu belongs to the Indo-Aryan (Indic) branch. Arabic belongs to another language family altogether, the Semitic languages, and is not related to Urdu and Persian. Nonetheless, there is a long history of interaction between the three languages. They are the three most prominent languages written in the Arabic script (Turkish used to be, but switched in 1928 to the Latin script), and are the most important languages of Islamic civilization and literature, both secular and religious (poetry was traditionally the most important form of literature in all three languages). One reason I’m interested in these languages, regardless of their importance in understanding the Middle East and South Asia, is the script. Calligraphy in Arabic, Persian, and Urdu is beautiful and strongly appeals to my aesthetic sense. There are so many styles of calligraphy that it is hard to believe these all come from the same script.


(Indo-European languages, with the Indo-Iranian branch in blue)


(Distribution of Arabic)

Generally, people not familiar with these languages learn Arabic first, then Persian, and then Urdu, if they end up learning all three. This is because Arabic is the most accessible language for learners, in terms of resources and spread; it is official in over 20 countries and an official language of the United Nations. On the other hand, Persian is official in three countries (Iran where it is called Farsi, Afghanistan where it is called Dari, and Tajikistan where it is called Tajik) and Urdu in two countries (Pakistan and India, but this means a lot of people understand it and with its sister-language Hindi, it is a lingua franca for most of South Asia). FYI: In terms of speakers, the order is Arabic, Urdu, Persian, but if you count Hindi and Urdu as one language, then it is Hindi-Urdu (545 million), Arabic (422 million), Persian (110 million).

There’s a certain linguistic logic to this as well. Linguistic borrowing has largely flowed in this direction. Arabic words entered Persian en masse after the Islamic conquest of Iran in 636-651. Modern Persian began to be written in the Arabic script, especially during the Samanid Dynasty (819-999) and and for the most part, the spelling of Arabic words was kept in Persian, though the pronunciation changed. Persian had a lot less of an impact on Arabic. In turn, after Islamic conquests in India, a lot of Persian (the lingua franca of the eastern Islamic world at that time) mixed with the dialect of Delhi (the Khadi Boli dialect of Hindi) and eventually gave rise to Urdu, which had emerged by the time of the Mughal Empire (1526-1858). Urdu and Hindi are the same language, descended from Sanskrit, with the same grammar and same basic vocabulary, but Urdu is written in the Persian variant of the Arabic script and Hindi in the Devanagari (as a result, they are associated with Muslims and Hindus respectively); at their higher level, they use some different vocabulary. For example, the word for politics in Hindi is राजनीति rajneeti, derived from Sanskrit while the Urdu word is سیاست siyasat, derived from Arabic.

As a result of the direction of borrowing, words that originate in Arabic make the most sense in Arabic, etymologically speaking, as well as many aspects of a word’s grammar or spelling. Arabic has 28 letters, all of which represent different sounds, whereas many of the more guttural sounds are conflated with each other when borrowed into Persian and Urdu. Four different Arabic sounds: ظ, ض, ذ, ز are all pronounced like ز (z) in Urdu and Persian, but words originating in Arabic keep the original spelling.

As an example, let us take the Arabic word book, kitaab, which derives from the Arabic root k-t-b, to write. A book, in grammatically proper Arabic is كِتَابٌ, kitabun, the un representing a grammatical case. The word is masculine in Arabic. Yet, on its own, the word has no meaning in terms of roots in Persian and Urdu, though it is a common word meaning book. In both Persian and Urdu, the word for book is kitaab, کتاب, where the grammatical ending un is unnecessary. The word pluralizes differently in all three languages, according to their rules. Arabic: kutubكُتُب ‎, Persian: kitaab’ha کتاب‌ها and Urdu: kitaben کتابیں किताबें ‎. Additionally, the word is masculine in Arabic, neutral in Persian (Persian doesn’t have gender), and feminine in Urdu (because books are generally considered feminine in Indo-Aryan languages; for example, the Sanskrit derived Hindi word for book, pustak पुस्तक is feminine). Fun stuff.

However, one doesn’t need to learn the languages in the Arabic>Persian>Urdu order. In fact, like learning any language, one can learn each language on its own, and take it on its own terms. For the same reason, its not necessary to learn French to learn English, although there are many French borrowings in English. Despite the borrowings, the majority of basic vocabulary and grammar in each language is unique to it. They share a script and literary references, but even those are not identical. Persian and Urdu tend to use different calligraphic styles (more flowing) than Arabic and have some letters not used in Arabic, like گ (g). Persian and Urdu tend to cluster together more than they do with Arabic, but they also have their differences. For example, the common Urdu word for marriage, شادی shadi, is a Persian borrowing, but technically meaning happiness in the original Persian, which instead describes this idea using the word arosi عروسی. Therefore, learning one language does not automatically help with learning with another, though there are connections.

Contrary to the order described before, I went through the whole process backward. I started with Urdu, then studied Persian, and finally Arabic. I don’t think it matters which order one does it in. To a large extent, having fun learning a language and picking it up are related to the connection one has with it. Urdu was never really a totally alien language for me because it is, as I mentioned above, essentially a variation of Hindi, and one that is perhaps more beautiful. I did take Urdu classes to learn the script and some of the finer points. Most of those finer points brought me straight to Persian. Almost all the Arabic in Urdu comes to that language through Persian and not directly; and there is a lot more direct Persian influence in Urdu than Arabic. Most “Urdu” words that distinguish it from Hindi are in fact Persian words used in Urdu. Finally, I recently started taking classes in Arabic out of personal interest and a desire to understand the roots of common words used in Urdu and Persian.

Here are some of my thoughts on learning all three languages.

In terms of ease, for an absolute beginner with no prior exposure to any of the three languages, Persian is by far the easiest, followed by Urdu, and then Arabic (I found Urdu easiest since I was partially exposed to it growing up as it is a variation of Hindi, then Persian, then Arabic). Both Persian and Urdu have grammars, verbs systems, and some vocabulary that’s familiar to speakers of European languages, and after all they belong to the same family as English, Spanish, and Russian.

Persian has an extremely regular and predictable grammar, no gender, and essentially no grammatical cases. It’s found a way to adapt the Arabic script in a very predictable manner to its sound system, including vowels. On the other hand, Urdu has a lot of vowels and must sometimes use the same Arabic letter, for example یِ, to indicate different sounds, which are differentiated in its sister language, Hindi: इ and ए. Persian has a lot of literature, but its main disadvantage is that it is spoken in countries hard to visit if you live in the West and there are not many easily accessible learning resources. In terms of aesthetics, it is often characterized as very sweet and soft. This is true. But like French, it can also come off as a bit constructed and artificial, partially designed to be spoken in a very elaborate way. After all, a synonym for Persian is Dari, the language of the [princely] court.

A bit on the literary history of Persian and the Islamic world. Modern Persian developed out of Middle Persian around the 9th century, and was the first language to be written after Arabic in the Islamic world. During the initial rise of that civilization, especially during the Ummayad (661-750) and early Abbasid caliphates (750-945), almost all writing was done in Arabic, even if the writers were Persians or Turks. This began to change by the 12th century. In the eastern Islamic world, (Persia, Central Asia, India), the vast majority of writing was done in Persian, with Arabic relegated to a religious and legal role. Even in the Middle East proper, once the Ottoman Empire rose, much administrative and literary work was carried out in Turkish. It is interesting, then, that in the modern world, Persian has suffered such a decline of fortunes relative to its previous status as a lingua franca. There are many historical reasons for this. Persian culture is not the magnet it used to be (though Iranian films are among the best in the world): partially because it is now increasingly associated with an insular Shia-oriented nation-state, partially because it has since been absorbed by Urdu culture (and its role in administration taken by English) in South Asia and Russian in Central Asia. On the other hand, Arabic, despite its ups and downs, was spread out over a wider area, and never ceased to be used as a language of religion throughout the Islamic world, and thus always remained a major language, even when its literary output fell during the later Middle Ages.

Example of Persian, “Poem of the Butterflies,” by Farid ud-Din Attar (1145-1221), with both English subtitles and in Persian:

An example of spoken Modern Persian here:

Persian poetry is widely known and praised for its beauty, but Urdu poetry, also widely praised, especially in the subcontinent, feels a bit more natural, in its themes and choice of words. In its most fancy avatar, Urdu can also feel very artificial, but as the language of the streets, in its colloquial Urdu/Hindi form, it is part of the very comforting lingua franca of the subcontinent. Urdu is slightly more complicated than Persian because the Urdu-Arabic script is harder to adapt to its sounds; because it is such a mixed language (Urdu comes from a Turkic root meaning camp, because it was spoken in military camps where people mixed near Delhi), its roots are all over the place (Indic and Persian); because it has two genders and two (and a partial third) cases; because it is somewhat more divergent from European languages than Persian. But it makes up for this by (at least in common speech) having enormous English influence and by being hugely accessible, at least in the spoken form. Urdu has a closer and deeper relation with the Persian words borrowed into than Persian has with the Arabic words borrowed into that language, because Urdu and Persian share a closer literary culture. Nonetheless, words are pronounced differently in the two languages, even when mean the same and are spelled the same. For example, آزاد, which comes from Persian and means “free” is pronounced like “aazaad” in Urdu and “auzaud” in Persian (where the long a sound has become rounded).

Like Persian, Urdu’s grammar is pretty regular. Urdu/Hindi is spoken as a first or second language by hundreds of millions of people, and Bollywood music and videos are widely and easily accessible through the internet. It is also easy to get to India relative to Iran for a Westerner.  For this reason, I find Urdu the most relate-able of these three languages. It also flows the most naturally from my tongue, due to exposure. As such, due to the familiarity, I enjoy it the most of the three languages. I can personally identify (because I’m of South Asian origin) with and enjoy the combination and alliteration of Sanskrit, Persian, and Arabic roots, and Hindu and Muslim themes, all wrapped up together.

Examples of Urdu/Hindi. A passage from the famous poet Mirza Ghalib (1797-1869), a qawwali (musical way of expressing poetry) from the 13th century recorded in modern Pakistan, and a song from Bollywood (India) said to be representative of “pure” Urdu:

Listen: Mirza Ghalib (مِرزاغالِؔب)

Urdu has a reputation of being very beautiful in South Asia. In the song, to emphasize this, the guy says: جس کی زباں اردو کی طرح ,  jiski zubaan urdu ki tarah, “her words are like Urdu.”

Learning Arabic is a very different experience from Urdu and Persian. It is the hardest of the three languages to learn (unless one knows Hebrew beforehand). Many of its dialects are essentially separate languages, while the modern form is mostly a written language.

While the sounds and grammar of the Urdu and Persian languages follow some patterns familiar to people from the West, learning Arabic is in many ways learning a new skill, not just a new language. For example, its grammar follows a Verb-Subject-Object order. It indicates moods and different ideas in verbs through complex patterns that play with the roots of verbs (there are over ten forms of verbs that give one different shades of meaning) unlike Persian and Urdu and English and French, all of which just conjugate endings of verbs. As a result, Arabic is a stunningly elegant (even if some do not find its sounds beautiful) language: as someone said on Quora, “the level of precision and complexity is astonishing, the subtle change in meaning that comes with simply adjusting one single [thing] is really remarkable.”

Arabic also contains some sounds that are found in relatively few other languages, particularly emphatic ones. These might give it a sort of guttural feeling that may be new for speakers of Western languages. Its vocabulary is pretty alien to a Westerner, and there are few cognates for comfort. For example, here are the numbers 1-5 in the three languages; see how different they are:

Persian: yek, do, se, chahar, panj

Urdu: ek, do, tin, char, panch 

Arabic: waHid, ithnayn, thalathah, arba’ah, khamsah

Obviously, Arabic, especially Classical Arabic is a very important language because of its place in world history as the language of Islam, which makes it an important language for people to learn. Interestingly, despite this complexity and reputation of sacredness, Arabic feels a lot more “earthy” than Persian, because in its classical form, it was not the language of the courts, but the language of the Bedouin (nomads) in the Arabian desert. There is something very mysterious about it, as though it came from deep in the sands. Although Arabic only emerged in the 6th century, it feels ancient and timeless.

Example of Classical Arabic, poem by the pre-Islamic poet Imru’ al-Qais:

As most readers know, the Qur’an is also in Classical Arabic. Here is the opening:

The same thing in Persian:

The same thing in Urdu:

Very cool, right? To conclude, learning all three languages, or any of the three is worth it. Even if one does’t learn them, it is really cool to think about the languages, their characteristics, and their connections.

