بسم الله الرحمن الرحيم

الحمد لله والصلاة والسلام على نبينا محمد. أما بعد:

1 Problem

Urdu and Persian are written in the Arabic script. These languages have vowels that don’t exist in Arabic and there is no disambiguous way to write these additional vowels in the Arabic script.

2 Vowels in Urdu and Classical Persian

Classical Persian and Urdu have all the vowels and diphthongs (albeit with different qualities) that Arabic does:

Vowel Representation Example
a ـَ شَب šab
i ـِ مِل mil
u ـُ بُت but
ā ـَا مَال māl
ī ـِی مِیل mīl
ū ـُو رُوح rūḥ
aw ـَوْ غَوْر ġawr
ay ـَیْ مَیْل mayl

However, Classical Persian and Urdu have two additional vowels that are not found in Arabic:

  1. ō: This is called واوِ مجهول wāw-i majhūl “unknown waw”, called so because this sound is unknown in Arabic. There is currently no standard way to disambiguously represent this in writing in the Arabic script. It is written without any diacritical mark, example شور šōr. But this is not disambiguating since vowels are usually omitted for all words and the above representation could just as easily be read šawr or šūr.

  2. ē: This is called ياءِ مجهول yāʾ-i majhūl “unknown yeh”, called so because this sound is unknown in Arabic. Urdu has invented a way to somewhat disambiguously write this vowel, but only at the end of a word. For this Urdu uses “Big yeh” ے (U+06D2). For example لے . Conversely, for the vowel ī at the end of a word, Urdu uses “small yeh” ی (U+06CC), for example: لِی . By the way, the diphthong ay at the end of a word is also written with “big yeh”, e.g. مَےْ may. In the middle of a word, there is no disambiguation, e.g. شیر šēr.

3 Proposal

We propose the use of U+08F3 “Arabic Small High Waw” and U+06E7 “Arabic Small High Yeh” as diacritics to represent the above two vowels disambiguously. These were originally added to Unicode for Qurʾanic orthography. However we may re-purpose them in the context of Urdu and Classical Persian typesetting.

Vowel Diacritic Representation Example
ō U+08F3 Arabic Small High Waw ـوࣳ شوࣳر šōr
ē U+06E7 Arabic Small High Yeh ـیۧ , ـےۧ شیۧر šēr , لےۧ

It is possible that U+08F3 Arabic Small High Waw ◌ࣳ could be confused with U+064F Arabic Damma ◌ُ but this should not be a major issue because Damma on wāw is rare except after ʾalif, e.g. تَشَاوُر tašāwur.

Here are some examples with the proposed scheme:

  • شُورَیٰ šūrā
  • شوࣳر šōr
  • غَوْر ġawr
  • شِیر šīr
  • شیۧر šēr
  • غَیْر ġayr

For words ending with ē Urdu already uses U+06D2 “Arabic Letter Yeh Barree”. However, Classical Persian typesetters may prefer to use U+06CC “Arabic Letter Farsi Yeh” to be consistent. Even so, they should be able to use the proposed diacritic U+06E7 “Arabic Small High Yeh” on top of the final yeh.

Urdu typesetting Classical Persian typesetting
مِی مِی
وَلےۧ walē وَلیۧ walē
مَےْ may مَیْ may

4 Conclusion

We have proposed the use of two diacritics to represent واوِ مجهول wāw-i majhūl and ياءِ مجهول yāʾ-i majhūl. These may be useful for typesetting works where disambiguation is called for, like dictionaries and critical editions, without having to resort to transcription in the Latin script.