Hubbry Logo
Arabic scriptArabic scriptMain
Open search
Arabic script
Community hub
Arabic script
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Arabic script
Arabic script
from Wikipedia

Arabic script
Script type
Period
3rd century CE to the present[1]
DirectionRight-to-left
Official script Co-official script in:
Official script at regional level in:
4 sovereign states
LanguagesSee below
Related scripts
Parent systems
Child systems
N'Ko
Hanifi script
Persian alphabet
ISO 15924
ISO 15924Arab (160), ​Arabic
Unicode
Unicode alias
Arabic
 This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA. For the distinction between [ ], / / and ⟨ ⟩, see IPA § Brackets and transcription delimiters.
Worldwide use of the Arabic script
Arabic alphabet world distribution
Arabic alphabet world distribution
Countries where the Arabic script is:
 →  the sole official script
 →  official alongside other scripts
 →  official at a provincial level (China, India, Tanzania) or a recognized second script of the official language (Malaysia, Tajikistan)

The Arabic script is the writing system used for Arabic (Arabic alphabet) and several other languages of Asia and Africa. It is the second-most widely used alphabetic writing system in the world (after the Latin script),[2] the second-most widely used writing system in the world by number of countries using it, and the third-most by number of users (after the Latin and Chinese scripts).[3]

The script was first used to write texts in Arabic, most notably the Quran, the holy book of Islam. With the religion's spread, it came to be used as the primary script for many language families, leading to the addition of new letters and other symbols. Such languages still using it are Arabic, Persian (Farsi and Dari), Urdu, Uyghur, Kurdish, Pashto, Punjabi (Shahmukhi), Sindhi, Azerbaijani (Torki in Iran), Malay (Jawi), Javanese, Sundanese, Madurese and Indonesian (Pegon), Balti, Balochi, Luri, Kashmiri, Cham (Akhar Srak),[4] Rohingya, Somali, Mandinka, and Mooré, among others.[5] Until the 16th century, it was also used for some Spanish texts, and—prior to the script reform in 1928—it was the writing system of Turkish.[6]

The script is written from right to left in a cursive style, in which most of the letters are written in slightly different forms according to whether they stand alone or are joined to a following or preceding letter. The script is unicase and does not have distinct capital or lowercase letters.[7] In most cases, the letters transcribe consonants, or consonants and a few vowels, so most Arabic alphabets are abjads, with the versions used for some languages, such as Sorani dialect of Kurdish, Uyghur, Mandarin, and Serbo-Croatian, being alphabets. It is the basis for the tradition of Arabic calligraphy.

History

[edit]

The Arabic alphabet is derived either from the Nabataean alphabet[8][9] or (less widely believed) directly from the Syriac alphabet,[10] which are both derived from the Aramaic alphabet, which, in turn, descended from the Phoenician alphabet. The Phoenician script also gave rise to the Greek alphabet (and, therefore, both the Cyrillic alphabet and the Latin alphabet used in North and South America and most European countries).

Origins

[edit]

In the 6th and 5th centuries BCE, northern Arab tribes emigrated and founded a kingdom centred around Petra, Jordan. This people (now named Nabataeans from the name of one of the tribes, Nabatu) spoke Nabataean Arabic, a dialect of the Arabic language. In the 2nd or 1st centuries BCE,[11][12] the first known records of the Nabataean alphabet were written in the Aramaic language (which was the language of communication and trade), but included some Arabic language features: the Nabataeans did not write the language which they spoke. They wrote in a form of the Aramaic alphabet, which continued to evolve; it separated into two forms: one intended for inscriptions (known as "monumental Nabataean") and the other, more cursive and hurriedly written and with joined letters, for writing on papyrus.[13] This cursive form influenced the monumental form more and more and gradually changed into the Arabic alphabet.

Overview

[edit]
the Arabic alphabet
خ ح ج ث ت ب ا
khā’ ḥā’ jīm tha’ tā’ bā’ alif
ص ش س ز ر ذ د
ṣād shīn sīn zāy /
zayn
rā’ dhāl dāl
ق ف غ ع ظ ط ض
qāf fā’ ghayn ‘ayn ẓā’ ṭā’ ḍād
ي و ه ن م ل ك
yā’ wāw hā’ nūn mīm lām kāf
أ آ إ ئ ؠ ء
alif hamza↑ alif madda alif hamza↓ yā’ hamza↑ kashmiri yā’ hamza rohingya yā’
ى ٱ ی ە ً ٌ ٍ
alif maksura alif wasla farsi yā’ ae fathatan dammatan kasratan
َ ُ ِ ّ ْ ٓ ۤ
fatha damma kasra shadda sukun maddah madda
ں ٹ ٺ ٻ پ ٿ ڃ
nūn ghunna ttā’ ttāhā’ bāā’ pā’ tāhā’ nyā’
ڄ چ ڇ ڈ ڌ ڍ ڎ
dyā’ tchā’ tchahā’ ddāl dāhāl ddāhāl duul
ڑ ژ ڤ ڦ ک ڭ گ
rrā’ jā’ vā’ pāḥā’ kāḥā’ ng gāf
ڳ ڻ ھ ہ ة ۃ ۅ
gueh rnūn hā’ doachashmee hā’ goal tā’ marbuta tā’ marbuta goal kirghiz oe
ۆ ۇ ۈ ۉ ۋ ې ے
oe u yu kirghiz yu ve e yā’ barree
(see below for other alphabets)

The Arabic script has been adapted for use in a wide variety of languages aside from Arabic, including Persian, Malay and Urdu, which are not Semitic. Such adaptations may feature altered or new characters to represent phonemes that do not appear in Arabic phonology. For example, the Arabic language lacks a voiceless bilabial plosive (the [p] sound), therefore many languages add their own letter to represent [p] in the script, though the specific letter used varies from language to language. These modifications tend to fall into groups: Indian and Turkic languages written in the Arabic script tend to use the Persian modified letters, whereas the languages of Indonesia tend to imitate those of Jawi. The modified version of the Arabic script originally devised for use with Persian is known as the Perso-Arabic script by scholars.

When the Arabic script is used to write Serbo-Croatian, Sorani, Kashmiri, Mandarin Chinese, or Uyghur, vowels are mandatory. The Arabic script can, therefore, be used as a true alphabet as well as an abjad, although it is often strongly, if erroneously, connected to the latter due to it being originally used only for Arabic.

Use of the Arabic script in West African languages, especially in the Sahel, developed with the spread of Islam. To a certain degree the style and usage tends to follow those of the Maghreb (for instance the position of the dots in the letters fāʼ and qāf).[14][15] Additional diacritics have come into use to facilitate the writing of sounds not represented in the Arabic language. The term ʻAjamī, which comes from the Arabic root for "foreign", has been applied to Arabic-based orthographies of African languages.

Wikipedia in Arabic script of five languages

Table of writing styles

[edit]
Script or style Alphabet(s) Language(s) Region Derived from Comment
Naskh Arabic,
Pashto,
& others
Arabic,
Pashto,
Sindhi,
& others
Every region where Arabic scripts are used Sometimes refers to a very specific calligraphic style, but sometimes used to refer more broadly to almost every font that is not Kufic or Nastaliq.
Nastaliq Urdu,
Shahmukhi,
Persian,
& others
Urdu,
Punjabi,
Persian,
Kashmiri
& others
Southern and Western Asia Taliq Used for almost all modern Urdu and Punjabi text, but only occasionally used for Persian. (The term "Nastaliq" is sometimes used by Urdu-speakers to refer to all Perso-Arabic scripts.)
Taliq Persian Persian A predecessor of Nastaliq.
Kufic Arabic Arabic Middle East and parts of North Africa
Rasm Restricted Arabic alphabet Mainly historical Omits all diacritics including i'jam. Digital replication usually requires some special characters. See: ٮ ڡ ٯ‎ (links to Wiktionary).

Table of alphabets

[edit]
Alphabet Letters Additional
Characters
Script or Style Languages Region Derived from:
(or related to)
Note
Arabic 28 ^(see above) Naskh, Kufi, Rasm, & others Arabic North Africa, West Asia Phoenician, Aramaic, Nabataean
Ajami script 33 ٻ تٜ تٰٜ Naskh Hausa, Yoruba, Swahili West Africa, East Africa Arabic Abjad | documented use likely between the 15th to 18th century for Hausa, Mande, Pulaar, Swahili, Wolof, and Yoruba Languages
Aljamiado 28 Maghrebi, Andalusi variant; Kufic Old Spanish, Andalusi Romance, Ladino, Aragonese, Valencian, Old Galician-Portuguese Southwest Europe Arabic 8th–13th centuries for Andalusi Romance, 14th–16th centuries for the other languages
Arebica 30 ڄ ە اٖى ي ڵ ںٛ ۉ ۆ Naskh Serbo-Croatian Southeastern Europe Perso-Arabic Latest stage has full vowel marking
Arwi alphabet 41 ڊ ڍ ڔ صٜ ۻ ڣ ڹ ݧ Naskh Tamil Southern India, Sri Lanka Perso-Arabic
Belarusian Arabic alphabet 32 Naskh Belarusian Eastern Europe Perso-Arabic 15th / 16th century
Balochi Standard Alphabet(s) 29 ٹ ڈ ۏ ݔ ے Naskh and Nastaliq Balochi South-West Asia Perso-Arabic, also borrows multiple glyphs from Urdu This standardization is based on the previous orthography. For more information, see Balochi writing.
Berber Arabic alphabet(s) 33 چ ژ ڞ ݣ ء Various Berber languages North Africa Arabic
Burushaski 53 ݳ ݴ ݼ څ ڎ ݽ ڞ ݣ ݸ ݹ ݶ ݷ ݺ ݻ
(see note)
Nastaliq Burushaski South-West Asia (Pakistan) Urdu Also uses the additional letters shown for Urdu.(see below) Sometimes written with just the Urdu alphabet, or with the Latin alphabet.
Chagatai alphabet 32 ݣ Nastaliq and Naskh Chagatai Central Asia Perso-Arabic ݣ is interchangeable with نگ and ڭ.
Dobrujan Tatar 32 Naskh Dobrujan Tatar Southeastern Europe Chagatai
Galal 32 Naskh Somali Horn of Africa Arabic
Jawi 36 چ ڠ ڤ ݢ ڽ ۏ Naskh Malay Peninsular Malaysia, Sumatra and part of Borneo Arabic Since 1303 AD (Trengganu Stone)
Kashmiri 44 ۆ ۄ ؠ ێ Nastaliq Kashmiri South Asia Urdu This orthography is fully voweled. 3 out of the 4 (ۆ, ۄ, ێ) additional glyphs are actually vowels. Not all vowels are listed here since they are not separate letters. For further information, see Kashmiri writing.
Kazakh Arabic alphabet 35 ٵ ٶ ۇ ٷ ۋ ۆ ە ھ ى ٸ ي Naskh Kazakh Central Asia, China Chagatai In use since 11th century, reformed in the early 20th century, now official only in China
Khowar 45 ݯ ݮ څ ځ ݱ ݰ ڵ Nastaliq Khowar South Asia Urdu, however, borrows multiple glyphs from Pashto
Kyrgyz Arabic alphabet 33 ۅ ۇ ۉ ۋ ە ى ي Naskh Kyrgyz Central Asia Chagatai In use since 11th century, reformed in the early 20th century, now official only in China
Pashto 45 ټ څ ځ ډ ړ ږ ښ ګ ڼ ۀ ي ې ۍ ئ Naskh and occasionally, Nastaliq Pashto South-West Asia, Afghanistan and Pakistan Perso-Arabic ګ is interchangeable with گ. Also, the glyphs ی and ې are often replaced with ے in Pakistan.
Pegon script 35 چ ڎ ڟ ڠ ڤ ڮ ۑ Naskh Javanese, Sundanese, Madurese South-East Asia (Indonesia) Arabic
Persian 32 پ چ ژ گ Naskh and Nastaliq Persian (Farsi) West Asia (Iran etc. ) Arabic Also known as
Perso-Arabic.
Shahmukhi 41 ݪ ݨ Nastaliq Punjabi South Asia (Pakistan) Perso-Arabic
Saraiki 45 ٻ ڄ ݙ ڳ Nastaliq Saraiki South Asia (Pakistan) Urdu
Sindhi 52 ڪ ڳ ڱ گ ک
پ ڀ ٻ ٽ ٿ ٺ
ڻ ڦ ڇ چ ڄ ڃ
ھ ڙ ڌ ڏ ڎ ڍ ڊ
Naskh Sindhi South Asia (Pakistan) Perso-Arabic
Sorabe 28 Naskh Malagasy Madagascar Arabic
Soranî 33 ڕ ڤ ڵ ۆ ێ Naskh Kurdish languages Middle-East Perso-Arabic Vowels are mandatory, i.e. alphabet
Swahili Arabic script 28 Naskh Swahili Western and Southern Africa Arabic
İske imlâ 35 ۋ Naskh Tatar Volga region Chagatai Used prior to 1920.
Ottoman Turkish 32 ئە Ottoman Turkish Ottoman Empire Chagatai Official until 1928
Urdu 39+
(see notes)
ٹ ڈ ڑ ں پ ھ چ ژ آ گ ے
(see notes)
Nastaliq Urdu South Asia Perso-Arabic 58 [citation needed] letters including digraphs representing aspirated consonants.
بھ پھ تھ ٹھ جھ چھ دھ ڈھ کھ گھ
Uyghur 32 ئا ئە ھ ئو ئۇ ئۆ ئۈ ۋ ئې ئى Naskh Uyghur China, Central Asia Chagatai Reform of older Arabic-script Uyghur orthography that was used prior to the 1950s. Vowels are mandatory, i.e. alphabet
Wolofal 33 ݖ گ ݧ ݝ ݒ Naskh Wolof West Africa Arabic, however, borrows at least one glyph from Perso-Arabic
Xiao'erjing 36 ٿ س﮲ ڞ ي Naskh Sinitic languages China, Central Asia Chagatai Used to write Chinese languages by Muslims living in China such as the Hui people.
Yaña imlâ 29 ئا ئە ئی ئو ئۇ ئ ھ Naskh Tatar Volga region İske imlâ alphabet 1920–1927 replaced with Cyrillic
Huit 29 is dead ـع

Current use

[edit]

Today Iran, Afghanistan, Pakistan, India, and China are the main non-Arabic speaking states using the Arabic alphabet to write one or more official national languages, including Azerbaijani, Baluchi, Brahui, Persian, Pashto, Central Kurdish, Urdu, Sindhi, Kashmiri, Punjabi and Uyghur.[citation needed]

An Arabic alphabet is currently used for the following languages:[citation needed]

Middle East and Central Asia

[edit]

East Asia

[edit]

South Asia

[edit]

Southeast Asia

[edit]
  • Malay in the Arabic script known as Jawi. In some cases it can be seen in the signboards of shops and market stalls, especially in rural or conservative areas of Malaysia, but it is no longer commonly used for everyday writing, being relegated instead to religious studies. Particularly in Brunei, Jawi is used in terms of writing or reading for Islamic religious educational programs in primary school, secondary school, college, or even higher educational institutes such as universities. In addition, some television programming uses Jawi, such as announcements, advertisements, news, social programs or Islamic programs
  • Cham language in Cambodia and Vietnam besides Western Cham script.

Europe

[edit]

Africa

[edit]

Former use

[edit]

With the establishment of Muslim rule in the subcontinent, one or more forms of the Arabic script were incorporated among the assortment of scripts used for writing native languages.[37] In the 20th century, the Arabic script was generally replaced by the Latin alphabet in the Balkans,[dubiousdiscuss] parts of Sub-Saharan Africa, and Southeast Asia, while in the Soviet Union, after a brief period of Latinisation,[38] use of Cyrillic was mandated. Turkey changed to the Latin alphabet in 1928 as part of an internal Westernizing revolution. After the collapse of the Soviet Union in 1991, many of the Turkic languages of the ex-USSR attempted to follow Turkey's lead and convert to a Turkish-style Latin alphabet. However, renewed use of the Arabic alphabet has occurred to a limited extent in Tajikistan, whose language's close resemblance to Persian allows direct use of publications from Afghanistan and Iran.[39]

Africa

[edit]

Europe

[edit]

Central Asia and Caucasus

[edit]

South and Southeast Asia

[edit]

Middle East

[edit]

Unicode

[edit]

As of Unicode 17.0, the following ranges encode Arabic characters:

Additional letters used in other languages

[edit]

Assignment of phonemes to graphemes

[edit]
∅ = phoneme absent from language
Language family Austron. Dravid. Turkic Indo-European Niger–Con.
Language/script Pegon Jawi Arwi Azeri Kazakh Uyghur Uzbek Sindhi Punjabi Urdu Persian Pashto[a] Balochi Kurdish Swahili
/t͡ʃ/ چ
/ʒ/ ژ
/p/ ڤ ڣ پ
/g/ ؼ ݢ ق گ ڠ
/v/ ۏ و ۆ ۋ و ڤ
/ŋ/ ڠ ڭ نگ‎ ڱ ن نݝ
/ɲ/ ۑ ڽ ݧ ڃ ن نْي
/ɳ/ ڹ ڻ ݨ ن ڼ

Table of additional letters in other languages

[edit]
Letter[A] Use & Pronunciation Unicode i'jam & other additions Shape Similar Arabic Letter(s)
U+ [B] [C] above below
Additional letters with additional marks
پ Pe, used to represent the phoneme /p/ in Persian, Pashto, Punjabi, Khowar, Sindhi, Urdu, Kurdish, Kashmiri; it can be used in Arabic to describe the phoneme /p/ otherwise it is written ب /b/. U+067E none 3 dots ٮ ب
ݐ used to represent the equivalent of the Latin letter Ƴ (palatalized glottal stop /ʔʲ/) in some African languages such as Fulfulde. U+0750   ﮳﮳﮳ ‎  none 3 dots
(horizontal)
ٮ ب
ٻ B̤ē, used to represent a voiced bilabial implosive /ɓ/ in Hausa, Sindhi and Saraiki. U+067B none 2 dots
(vertically)
ٮ ب
ڀ represents an aspirated voiced bilabial plosive // in Sindhi. U+0680 none 4 dots ٮ ب
ٺ Ṭhē, represents the aspirated voiceless retroflex plosive /ʈʰ/ in Sindhi. U+067A 2 dots
(vertically)
none ٮ ت
ټ Ṭē, used to represent the phoneme /ʈ/ in Pashto. U+067C ﮿ 2 dots ring ٮ ت
ٽ Ṭe, used to represent the phoneme (a voiceless retroflex plosive /ʈ/) in Sindhi U+067D 3 dots
(inverted)
none ٮ ت
Ṭe, used to represent Ṭ (a voiceless retroflex plosive /ʈ/) in Punjabi, Kashmiri, Urdu. U+0679 ◌ؕ small
ط
none ٮ ت
ٿ Teheh, used in Sindhi and Rajasthani (when written in Sindhi alphabet); used to represent the phoneme /t͡ɕʰ/ (pinyin q) in Chinese Xiao'erjing. U+067F 4 dots none ٮ ت
ڄ represents the "c" voiceless dental affricate /t͡s/ phoneme in Bosnian U+0684 none 2 dots
(vertically)
ح ج
ڃ represents the "ć" voiceless alveolo-palatal affricate /t͡ɕ/ phoneme in Bosnian. U+0683 none 2 dots ح ج
چ Che, used to represent /t͡ʃ/ ("ch"). It is used in Persian, Pashto, Punjabi, Urdu, Kashmiri and Kurdish. /ʒ/ in Egypt. U+0686 none 3 dots ح ج
څ Ce, used to represent the phoneme /t͡s/ in Pashto. U+0685 3 dots none ح خ
ݗ represents the "đ" voiced alveolo-palatal affricate /d͡ʑ/ phoneme in Bosnian. Also used to represent the letter X in Afrikaans. U+0757 2 dots none ح خ
ځ Źim, used to represent the phoneme /d͡z/ in Pashto. U+0681 ◌ٔ Hamza none ح خ
ڎ Used to represent the phoneme /ɖ/ in Somali U+068E 3 dots
ݙ used in Saraiki to represent a Voiced alveolar implosive /ɗ̢/. U+0759 small
ط
2 dots
(vertically)
د د
ڊ used in Saraiki to represent a voiced retroflex implosive //. U+068A none 1 dot د د
ڈ Ḍal, used to represent a Ḍ (a voiced retroflex plosive /ɖ/) in Punjabi, Kashmiri and Urdu. U+0688 ◌ؕ small ط none د د
ڌ Dhal, used to represent the phoneme /d̪ʱ/ in Sindhi U+068C 2 dots none د د
ډ Ḍal, used to represent the phoneme /ɖ/ in Pashto. U+0689 ﮿ none ring د د
ڑ Ṛe, represents a retroflex flap /ɽ/ in Punjabi and Urdu. U+0691 ◌ؕ small ط none ر ر
ړ Ṛe, used to represent a retroflex lateral flap in Pashto. U+0693 ﮿ none ring ر ر
ݫ used in Ormuri to represent a voiced alveolo-palatal fricative /ʑ/, as well as in Torwali. U+076B 2 dots
(vertically)
none ر ر
ژ Že / zhe, used to represent the voiced postalveolar fricative /ʒ/ in, Persian, Pashto, Kurdish, Urdu, Punjabi and Uyghur. U+0698 3 dots none ر ز
ږ Ǵe / ẓ̌e, used to represent the phoneme /ʐ/ /ɡ/ /ʝ/ in Pashto. U+0696 1 dot 1 dot ر ز
ڕ used in Kurdish to represent rr /r/ in Soranî dialect. U+0695 ٚ none V pointing down ر ر
ݭ used in Kalami to represent a voiceless retroflex fricative /ʂ/, and in Ormuri to represent a voiceless alveolo-palatal fricative /ɕ/. U+076D 2 dots vertically none س س
ݜ used in Shina to represent a voiceless retroflex fricative /ʂ/. U+075C 4 dots none س ش
ښ X̌īn / ṣ̌īn, used to represent the phoneme /x/ /ʂ/ /ç/ in Pashto. U+069A 1 dot 1 dot س س
ڜ‎ Used in Wakhi to represent the phoneme /ʂ/. U+069C 3 dots 3 dots س ش
ڞ Used to represent the phoneme /tsʰ/ (pinyin c) in Chinese. U+069E 3 dots none ص ض
ڠ Nga /ŋ/ in the Jawi script and Pegon script. U+06A0 3 dots none ع غ
ڤ Ve, used in Kurdish to represent /v/, it can be used in Arabic to describe the phoneme /v/ otherwise it is written ف /f/. Pa, used in the Jawi script and Pegon script to represent /p/. U+06A4 3 dots none ڡ ف
ڥ Vi, used in Algerian Arabic and Tunisian Arabic when written in Arabic script to represent the sound /v/ if needed. U+06A5 none 3 dots ڡ ف
ڨ Ga, used to represent the voiced velar plosive /ɡ/ in Algerian and Tunisian. U+06A8 3 dots none ٯ ق
ڭ Ng, used to represent the /ŋ/ phone in Ottoman Turkish, Kazakh, Kyrgyz, and Uyghur.

Used to represent /ɡ/ in Morocco and in many dialects of Algerian.

U+06AD 3 dots none ك ك
ڬ Gaf, represents a voiced velar plosive /ɡ/ in the Jawi script of Malay. U+06AC 1 dot none ك ك
ݢ U+0762 1 dot none ک ك
گ Gaf, represents a voiced velar plosive /ɡ/ in Persian, Pashto, Punjabi, Somali, Kyrgyz, Kazakh, Kurdish, Uyghur, Mesopotamian Arabic, Urdu and Ottoman Turkish. U+06AF line horizontal line none ک ك
ګ Gaf, used to represent the phoneme /ɡ/ in Pashto. U+06AB ﮿ ring none ک ك
ؼ Gaf, represents a voiced velar plosive /ɡ/ in the Pegon script of Indonesian. U+08B4 none 3 dots ک ك
ڱ represents the Velar nasal /ŋ/ phoneme in Sindhi. U+06B1 2 dots + horizontal
line
none ک ك
ڳ represents a voiced velar implosive /ɠ/ in Sindhi and Saraiki U+06B1 horizontal
line
2 dots ک ك
ݣ used to represent the phoneme /ŋ/ (pinyin ng) in Chinese. U+0763 none 3 dots ک ك
ݪ used in Marwari to represent a retroflex lateral flap /ɺ̢/, and in Kalami to represent a voiceless lateral fricative /ɬ/. U+076A line horizontal
line
none ل ل
– or alternately typeset as لؕ ‎ – is used in Punjabi to represent voiced retroflex lateral approximant /ɭ/[43] U+08C7 ◌ؕ small ط none ل ل
لؕ U+0644 U+0615
ڵ used in Kurdish to represent ll /ɫ/ in Soranî dialect. Represents the "lj" palatal lateral approximant /ʎ/ phoneme in Bosnian. U+06B5 ◌ٚ V pointing down none ل ل
ڼ represents the retroflex nasal /ɳ/ phoneme in Pashto. U+06BC ﮿ 1 dot ring ں ن
ڻ represents the retroflex nasal /ɳ/ phoneme in Sindhi. U+06BB ◌ؕ small ط none ں ن
ݨ used in Punjabi to represent /ɳ/ and Saraiki to represent /ɲ/. U+0768 1 dot + small ط none ں ن
ڽ Nya /ɲ/ in the Jawi script ڽـ ـڽـ ڽ., The isolated ڽ‎ and final ـڽ‎ resemble the form ڽ, while the initial ڽـ‎ and medial forms ـڽـ‎, resemble the form پ. U+06BD 3 dots none ں ن
ݩ represents the "nj" palatal nasal /ɲ/ phoneme in Bosnian. U+0769 ◌ٚ 1 dot
V pointing down
none ں ن
ۅ Ö, used to represent the phoneme /ø/ in Kyrgyz. U+0624 ◌̵ Strikethrough[D] none و و
ﻭٓ Uu, used to represent the phoneme // in Somali. ‎ + ◌ٓU+0648 U+0653 ◌ٓ Madda none و + ◌ٓ
ۏ Va in the Jawi script. U+06CF 1 dot none و و
ۋ represents a /v/ in Kyrgyz, Uyghur, and Old Tatar; and /w, ʊw, ʉw/ in Kazakh; also formerly used in Nogai. U+06CB 3 dots none و و
ۆ represents "o" // in Kurdish, "ü" /y/ in Azerbaijani, and /ø/ in Uyghur as part of the digraph ئۆ. It represents the "u" /u/ phoneme in Bosnian. U+06C6 ◌ٚ V pointing down none و و
ۇ U, used to represents the /u/ phoneme in Azerbaijani, Kazakh, Kyrgyz and Uyghur. U+06C7 ◌ُ Damma[E] none و و
ۉ represents the "o" /ɔ/ phoneme in Bosnian. Also used to represent /ø/ in Kyrgyz. U+06C9 ◌ٛ V pointing up none و و
ىٓ Ii, used to represent the phoneme // in Somali and Saraiki. U+0649 U+0653 ◌ٓ Madda none ى ي
ې Pasta Ye, used to represent the phoneme /e/ in Pashto and Uyghur. U+06D0 none 2 dots vertical ى ي
ۍ X̌əźīna ye Ye, used to represent the phoneme [əi] in Pashto. U+06CD line horizontal
line
none ى ي
ۑ Nya /ɲ/ in the Pegon script. U+06D1 none 3 dots ى ي
ێ represents ê // in Kurdish. U+06CE ◌ٚ V pointing down 2 dots
(start + mid)
ى ي
Additional letters with shape alteration
ک Khē, represents // in Sindhi. U+06A9 none none none ک ك
ڪ "Swash kāf" is a stylistic variant of ك ‎ in Arabic, but represents un- aspirated /k/ in Sindhi. U+06AA none none none ڪ ك
ھ
ھ
Do-chashmi he (two-eyed hāʼ), used in digraphs for aspiration /ʰ/ and breathy voice /ʱ/ in Punjabi and Urdu. Also used to represent /h/ in Kazakh, Sorani and Uyghur.[F] U+06BE none none none ھ ه ‎ / هـ
ە Ae, used represent /æ/ and /ɛ/ in Kazakh, Sorani and Uyghur. U+06D5 none none none ه ه ‎ / هـ
ے Baṛī ye ('big yāʼ'), is a stylistic variant of ي in Arabic, but represents "ai" or "e" /ɛː/, // in Urdu and Punjabi. U+06D2 none none none ے ي
Additional Digraph letters
أو Oo, used to represent the phoneme // in Somali. U+0623 U+0648 ◌ٔ Hamza none او أ + و
اٖى represents the "i" /i/ phoneme in Bosnian. U+0627 U+0656 U+0649 ◌ٖ Alef none اى اٖ + ى
أي Ee, used to represent the phoneme // in Somali. U+0623 U+064A ◌ٔ Hamza 2 dots اى أ + ي
  1. ^ letter or digraph
  2. ^ Joined to the letter, closest to the letter, on the first letter, or above.
  3. ^ Further away from the letter, or on the second letter, or below.
  4. ^ A variant that end up with loop also exists.
  5. ^ Although the letter also known as Waw with Damma, some publications and fonts features filled Damma that looks similar to comma.
  6. ^ Shown in Naskh (top) and Nastaliq (bottom) styles. The Nastaliq version of the connected forms are connected to each other, because the tatweel character U+0640 used to show the other forms does not work in many Nastaliq fonts.

Letter construction

[edit]

Most languages that use alphabets based on the Arabic alphabet use the same base shapes. Most additional letters in languages that use alphabets based on the Arabic alphabet are built by adding (or removing) diacritics to existing Arabic letters. Some stylistic variants in Arabic have distinct meanings in other languages. For example, variant forms of kāf ك ک ڪ ‎ are used in some languages and sometimes have specific usages. In Urdu and some neighbouring languages, the letter Hā has diverged into two forms ھ dō-čašmī hē and ہ ہـ ـہـ ـہ gōl hē,[44] while a variant form of ي referred to as baṛī yē ے ‎ is used at the end of some words.[44]

Table of letter components

[edit]

See also

[edit]

Explanatory notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Arabic script is an comprising 28 letters, primarily used to write the language and adapted for numerous others such as Persian, , , and Malay, characterized by its right-to-left flow and positional variations in letter forms. Originating from the Nabatean script around the CE, with additional influences from Syriac, it evolved through transitional "Nabateo-Arabic" phases and was standardized between the 5th and 6th centuries, as evidenced by early inscriptions like the Namāra epitaph from 328 CE. This script's development accelerated with the rise of in the , particularly through its role in transcribing the , transitioning from rudimentary Hijazi styles—marked by slanted, undotted letters on —to more angular forms in the and fluid, rounded Naskh scripts by the for improved readability. Key structural features include letters that connect in cursive chains, with most adopting 2 to 4 allographic shapes depending on their position (isolated, , medial, or final) in a word, and diacritical dots to differentiate similar forms, while short vowels are optionally indicated by diacritics and long vowels by specific letters. Lacking uppercase or lowercase distinctions, the script's flexibility allowed it to accommodate non-Semitic languages by adding letters or modifications, such as the four extra letters in Persian. Historically, it served not only linguistic but also artistic and mystical purposes, with letters assigned numerical values linked to lunar stations and talismanic significance in Islamic tradition. In modern usage, the Arabic script remains the third-most widely employed globally, supporting over 400 million native speakers and additional users in regions from the to and , though contemporary texts often omit vowel diacritics for brevity. Its calligraphic styles, including the everyday Ruqʿa and ornate , continue to influence , digital fonts, and cultural expressions, while adaptations persist in languages like Kurdish (Sorani) and Uyghur, reflecting its enduring adaptability across diverse linguistic and technological contexts.

History

Origins and early development

The Arabic script traces its distant origins to ancient Egyptian hieroglyphs through the Proto-Sinaitic script (ca. 1850–1500 BCE). Semitic-speaking peoples in the Sinai Peninsula and Egypt adapted select Egyptian hieroglyphs using the acrophonic principle—assigning the initial consonant sound of the Semitic word for the depicted object—to form the earliest known abjad, representing consonants in a simplified linear manner. This Proto-Sinaitic script evolved into Proto-Canaanite and subsequently the Phoenician alphabet (ca. 11th century BCE), which gave rise to the Aramaic alphabet. The Nabataean variant of late Aramaic then developed into the distinct Arabic script by the 4th century CE. Although sharing this ancestral lineage, the Arabic script differs profoundly from Egyptian hieroglyphs in appearance and function. Hieroglyphs comprised a complex pictorial and logographic system with hundreds of symbols that could represent words, syllables, or sounds, whereas Arabic is a cursive, right-to-left abjad primarily denoting consonants with connected letter forms in fluid, joined writing. The Arabic script originated as a derivative of the Nabataean variant of the late Aramaic alphabet, which itself evolved from the earlier Phoenician script around the 4th century CE in the Arabian Peninsula and surrounding regions. This development occurred among Arab tribes in northern Arabia and the Levant, where the Nabataean kingdom (c. 320 BCE–106 CE) facilitated cultural and linguistic exchanges through trade routes connecting Petra, Syria, and the Hijaz. The script's emergence reflects a gradual adaptation of Aramaic's 22-letter consonantal system to accommodate Arabic phonetics, expanding to 28 letters by incorporating additional sounds unique to Arabic. Key evidence of proto-Arabic forms appears in early inscriptions, such as the from 328 CE, discovered near in modern-day . This funerary for the Lakhmid king Imru' al-Qays is written in the but employs language, marking it as one of the oldest dated attestations of written and demonstrating the script's transitional use for Arabic texts. Similarly, the Zabad inscription from 512 CE, found near , is a trilingual dedication in Greek, Syriac, and on a church , with the Arabic portion showcasing emerging cursive tendencies and terms like "al-ilah" (the ), highlighting the script's role in pre-Islamic religious and communal contexts. These artifacts, primarily from funerary and dedicatory purposes, illustrate the script's initial application in northern Arabian and Syrian territories influenced by nomadic and settled Arab communities. The transition from angular, monumental forms to more fluid, cursive styles was driven by practical needs in trade, administration, and everyday writing on perishable materials like during the pre-Islamic era. Nabataean inscriptions, often chiseled in stone for durability, featured rigid lines suited to work, but as Arabic speakers adopted the script for broader commercial exchanges along caravan routes, ligatures and connections between letters began to appear, smoothing the forms for quicker inscription. This evolution is evident in the semi-cursive Nabataean examples from the 2nd century BCE onward, which prefigure the interconnected nature of mature Arabic writing. A representative example of letter evolution is the Arabic alif (ا), which developed from the Phoenician aleph (𐤀), an ox-head symbol simplified in Aramaic to a vertical stroke and further adapted in Nabataean to a slanted or hooked form before straightening in proto-Arabic by the 4th century CE. Such changes preserved the consonantal value while aligning with Arabic's phonetic requirements, laying the groundwork for the script's later refinements.

Spread and evolution in the Islamic era

The rise of in the 7th century facilitated the rapid dissemination of the Arabic script across vast territories through military conquests, transforming it from a regional into a vehicle for religious, administrative, and cultural expression. Following the death of Prophet Muhammad in 632 CE, Arab Muslim armies expanded into Persia by 651 CE under the and later the Umayyads, conquering the and introducing Arabic script for Quranic dissemination and governance. Similarly, conquests reached by the late 7th century, with Umayyad forces capturing in 642 CE and advancing westward to establish script use in official documents and Islamic texts, laying the groundwork for regional linguistic integrations. These expansions, extending to the and by the 8th century, prompted initial adaptations, such as the incorporation of Persian sounds into the script for administrative purposes in conquered regions. Under the (661–750 CE), efforts to standardize the Arabic script intensified to support the empire's administrative needs and the accurate transmission of the . Caliph ʿAbd al-Malik (r. 685–705 CE) played a pivotal role, commissioning the codification of the script between 684 and 692 CE, an angular, geometric style suited for monumental inscriptions like those on the in . This script, derived from earlier Hijazi forms, became the primary medium for Quranic manuscripts, emphasizing clarity and aesthetic rigidity without vowels or extensive diacritics in early versions. The (750–1258 CE) further refined this standardization in the 8th and 9th centuries, with emerging as a center for script evolution; continued for religious texts while transitioning toward more styles to enhance legibility in expanding literary and bureaucratic contexts. To address ambiguities in consonant differentiation, particularly for non-Arabic speakers reciting the , diacritical marks known as i'jam were introduced around 684 CE by the grammarian Abu al-Aswad al-Du'ali (d. 688 CE), a companion of ibn Abi Talib. Commissioned by the Umayyad governor of , al-Du'ali devised a system of dots placed above, below, or beside letters to distinguish similar forms, such as ب (bāʾ), ت (tāʾ), ث (thāʾ), ن (nūn), ي (yāʾ), and ى (alif maqṣūrah), thereby preventing misreadings in sacred texts. This innovation, initially applied to Quranic codices, marked a crucial step in script legibility during the early Islamic expansions. Building on this foundation, vowel pointing or tashkil was innovated in the by the Basran scholar Khalil ibn Ahmad al-Farahidi (d. 786 CE) to indicate short vowels and phonetic nuances more precisely. Al-Farahidi replaced earlier colored dots with a refined set of symbols—such as fatḥah (a horizontal line for /a/), kasrah (a diagonal line for /i/), and ḍammah (a curl for /u/)—derived from letter shapes, enabling accurate recitation for diverse linguistic communities. His system, integrated into Quranic and grammatical works like , became standard by the 11th century, supporting the script's adaptability in Persian and North African contexts where vowel systems differed from Arabic.

Modern standardization and reforms

The reforms in the from 1839 to 1876 marked a pivotal era for the Arabic script's adaptation to modern printing technologies, as the expansion of printing presses enabled the widespread dissemination of official edicts and educational materials in Arabic script, overcoming earlier restrictions on Muslim use of . This period also saw early proposals for orthographic simplification to address the script's complexities in representing Turkish phonetics, notably through Iranian intellectual Mirza Malkum Khan's 1860s reform plan, which advocated adding diacritics and new letters to enhance readability and literacy rates. These efforts laid groundwork for later script reforms by highlighting the need for amid growing . In post-colonial contexts, Egypt's 1920s intellectual and journalistic movements pushed for Arabic language consistency to support and print media expansion, culminating in the 1932 founding of the Academy of the Arabic Language (Majmaʿ al-Lughah al-ʿArabīyah) by King Fuad I, which focused on unifying , coining modern terms, and resolving ambiguities in usage. Similar initiatives occurred across Arab states, aiming to bridge with contemporary needs in and administration. Reforms in non-Arab countries diverged more radically; in 1928, under enacted Law No. 1353, mandating a switch from the Arabic script to a Latin-based to boost from around 10% to over 20% within a decade and align with Western modernization, effectively severing ties to Ottoman-Islamic traditions. In , Pahlavi's 1930s cultural policies included orthographic tweaks to the Perso-Arabic , such as standardizing short vowel markings and reducing optional ligatures through the 1935 Academy of Iran, to better suit without a full script change. Contemporary efforts have emphasized digital adaptation and cultural preservation; in the 1980s, advanced Arabic script standards for , with initiatives at institutions like developing early font encoding systems to accommodate the script's forms in word processors and databases, paving the way for global digital typography. Since the 2000s, has supported projects to safeguard endangered Arabic script variants, including of Ajami manuscripts—modified Arabic scripts for African languages like Wolof and Hausa—through partnerships like the British Library's Endangered Archives Programme, such as the pilot project EAP915, which identified and cataloged 807 endangered Arabic manuscripts in regions including . More recently, as of 2023, the EAP has digitized over 100,000 pages of Ajami materials across , including projects for Mandinka and Wolof languages.

Core Features

Alphabet and basic letters

The Arabic script is fundamentally an , a that primarily represents consonants while vowels are either omitted in standard orthography or indicated through optional diacritics, setting it apart from full like the that include inherent vowel markers. This consonantal focus facilitates concise writing but requires contextual knowledge for accurate , particularly among native speakers. The standard Arabic alphabet comprises 28 letters, each with a distinct name and phonemic value, derived from the ancient Nabataean and scripts but refined over centuries. Unlike scripts with case distinctions, Arabic employs a unicase design, meaning there are no uppercase or lowercase forms; instead, letters adapt through four positional variants—initial (at the start of a word), medial (within a word), final (at the end of a word), and isolated (standalone or after a non-connecting letter)—to suit the cursive flow of connected writing. This positional flexibility is a core feature enabling the script's elegant, flowing appearance without altering the letter's essential identity. The following table lists the 28 core letters in their isolated forms, along with their conventional names and approximate International Phonetic Alphabet (IPA) phonemic values, ordered from right to left as per Arabic reading direction. These phonemes represent standard pronunciations, though regional dialects may vary slightly.
Isolated FormNameIPA Phoneme
اalif/ʔ/ or /aː/
بbāʾ/b/
تtāʾ/t/
ثthāʾ/θ/
جjīm/dʒ/
حḥāʾ/ħ/
خkhāʾ/x/
دdāl/d/
ذdhāl/ð/
رrāʾ/r/
زzāy/z/
سsīn/s/
شshīn/ʃ/
صṣād/sˤ/
ضḍād/dˤ/
طṭāʾ/tˤ/
ظẓāʾ/ðˤ/
عʿayn/ʕ/
غ/ɣ/
فfāʾ/f/
قqāf/q/
كkāf/k/
لlām/l/
مmīm/m/
نnūn/n/
هhāʾ/h/
وwāw/w/ or /uː/
يyāʾ/j/ or /iː/

Contextual forms and cursive nature

The Arabic script is fundamentally , with letters designed to connect to one another within words, creating a continuous flow even in printed text. This connectivity is a core feature that distinguishes it from many other writing systems, facilitating efficient writing and aesthetic harmony in . The script is written and read from right to left, with letters aligning along a horizontal baseline that serves as the primary anchor for placement, ensuring uniform visual structure across words and lines. Letters in the Arabic script assume one of four contextual forms depending on their position within a word: isolated (when standing alone or not connecting), (at the beginning of a word, connecting only to the following letter), medial (in the middle, connecting to both preceding and following letters), and final (at the end of a word, connecting only to the preceding letter). These forms allow each of the 28 basic letters to adapt its dynamically, often resulting in significant visual differences from their isolated counterparts—for instance, the letter beh (ب) appears as ب in isolated form, ﺑ in , ﺒ in medial, and ﺐ in final. This positional variation is governed by standards for rendering, where font systems select the appropriate based on the letter's joining behavior and neighbors. The cursive nature is further defined by specific joining rules: 22 letters are dual-joining, capable of connecting to both the preceding (right) and following (left) letters, while 6 letters are right-joining only, connecting solely to the preceding letter and leaving a gap to the left. Examples of right-joining letters include dal (د), which maintains its isolated or final form (د or ﺩ) without linking leftward, and reh (ر), similarly non-connective to the left (ر or ﺭ). Non-joining elements, such as the hamza (ء), do not connect at all. These rules ensure readability by preventing ambiguous merges, particularly for letters with similar shapes. In , these principles combine to create fluid sequences; for example, the word كتاب (kitāb, meaning "") is rendered from right to left with the initial kāf (ك) connecting to the medial tāʾ (ﺘ), which connects to the final alif (ا); the beh (ب) appears isolated to the right of the alif since the right-joining alif terminates the connection without linking leftward. This example illustrates how dual-joining letters like kāf and tāʾ adapt across positions, while the right-joining alif prevents connection to the beh, maintaining the script's baseline alignment and integrity.

Diacritics, vowels, and orthographic conventions

The Arabic script employs a system of diacritical marks known as harakat (حَرَكَاتْ) to indicate short vowels and other phonetic nuances, which are optional in most modern writing but essential for precise pronunciation. These marks are placed above or below consonants and include the fatha (ـَ), a short diagonal line above the letter representing the vowel sound /a/ as in "father"; the damma (ـُ), a small curl above the letter for /u/ as in "put"; and the kasra (ـِ), a short diagonal line below the letter for /i/ as in "bit". For example, the consonant ب (bāʾ) becomes بَ (ba), بُ (bu), and بِ (bi) respectively. Absence of a vowel is denoted by the sukun (ـْ), a small circle placed above the , indicating a consonant closure without a following short vowel, as in بْ (b, pronounced with a brief pause). The shadda (ـّ), resembling a small "w" above the letter, signifies or consonant doubling, where the is held longer and often stressed, effectively combining a sukun on the first instance and a vowel on the second; for instance, بّ (bb) in words like شَدَّة (shadda itself). Tanwin, or , marks indefinite nouns with a doubled short vowel sound at the end, using fathatan (ـً) for /an/, dammatan (ـٌ) for /un/, and kasratan (ـٍ) for /in/, as seen in كِتَابٌ (kitābun, "a "). These diacritics are encoded in as combining characters, such as U+064E for fatha and U+0651 for shadda, ensuring consistent digital representation. Orthographic conventions in Arabic script govern the placement and form of elements like the hamza (ء), a glottal stop consonant that requires specific seating rules based on its position and surrounding vowels to maintain visual and phonetic clarity. In initial position, hamza is always seated on an alif (ا), forming أ (with fatha or damma) or إ (with kasra), as in أَب (ab, "father"). Medially, it seats on the nearest compatible letter: on a dotless yā (ي) for kasra (ئ), on wāw (و) for damma (ؤ), or on alif otherwise, as in سُؤَال (suʾāl, "question"). At the end of a word, it appears on the line (ء) after a long vowel or sukun, or seated on the appropriate carrier after a short vowel, such as نَشَأَ (nashaʾa, "he arose"). These rules prevent ambiguity and align with the script's cursive flow, though pronunciation of hamza may vary by dialect. Tajweed rules such as assimilation (idgham) and clear pronunciation (izhar) apply to the recitation of tanwin or nun sakinah before certain letters—for example, idgham merges the sound before ي, ر, م, ل, و, ن, while izhar pronounces it clearly before throat letters like ء, ه, ع, ح, غ, خ—but these affect pronunciation only and do not alter the written orthography or diacritic placement. In modern usage, diacritics are fully employed in religious texts like the for accurate tajweed recitation and in educational materials for learners, where they aid comprehension and reduce ambiguity in a script that otherwise relies on for vowels. However, in everyday print media, newspapers, and , they are largely omitted to save space and reflect native reader proficiency, appearing sporadically only for disambiguation in or proper names. Digital trends since the have seen increased optional use in and apps for clarity among non-native speakers, though full diacritization remains rare outside formal contexts; scholarly analyses note significant variation across genres, with religious and pedagogical texts showing near-complete marking rates compared to under 5% in general .

Styles and Variants

Calligraphic and typographic styles

The Arabic script has evolved through a rich tradition of calligraphic styles, each reflecting cultural, religious, and artistic influences across centuries. These styles originated in the early Islamic period and adapted to various media, from manuscripts to , emphasizing the script's inherent and contextual forms. Major styles like , Naskh, and emerged as foundational, balancing aesthetic elegance with readability, and later influenced typographic developments in and . Kufic, one of the earliest formal styles, developed in the late 7th century in Kufa, Iraq, characterized by its angular, geometric letterforms with thick strokes and minimal curves. It flourished from the 7th to 10th centuries, primarily used for Qur'anic manuscripts and architectural inscriptions due to its bold, monumental appearance suitable for stone, metal, and coinage. Examples include early Qur'ans on vellum and decorations on mosques like the Dome of the Rock in Jerusalem. Naskh emerged in the 10th century as a more fluid, cursive alternative to angular scripts, designed for legibility in everyday and scholarly writing. It became the predominant bookhand by the 11th century, serving as the basis for copying Qur'ans, administrative documents, and literature across the Islamic world, with regional variations in Egypt, Iraq, and Syria. Its rounded, proportional letters facilitated widespread adoption and laid the groundwork for modern printed Arabic. Nastaliq, a highly stylized and flowing script, originated in 14th-century Persia, blending elements of Naskh and Ta'liq for poetic expression. It gained dominance in the 15th century for Persian and Urdu literature, poetry, and official documents, prized for its diagonal slant, elongated horizontals, and rhythmic curves that evoke motion. Prominent in regions like Iran, Pakistan, and India, it remains a staple in South Asian manuscript traditions. The transition to typography began with the introduction of movable type for Arabic script in the early 19th century, notably at Egypt's Bulaq Press established in 1815 under Ottoman rule, where naskh and nasta'liq typefaces were cast for books like dictionaries and Qur'ans. In the Ottoman Empire proper, widespread adoption occurred in the 1860s with innovations by typefounder Ohanis Mühendisoğlu, who adapted calligraphic proportions to metal type for naskh-based printing. This evolution addressed the script's cursive joining and contextual variants, enabling mass production of texts. By the 20th century, refined typefaces like that designed by Mohamed Bek Ja‘far in 1906 for Bulaq Press set standards for clarity in printed Qur'ans. Digital typography advanced in the 2010s with open-source fonts such as Amiri, a 2011 revival by Khaled Hosny of early 20th-century Bulaq naskh, optimized for book typesetting and Qur'anic text in software like LaTeX and web browsers.
StylePeriodKey CharacteristicsPrimary RegionsNotable Uses and Examples
Kufic7th–10th centuriesAngular, geometric, bold strokes, , ArabiaQur'anic manuscripts; architectural inscriptions (e.g., )
Naskh10th century onwardCursive, rounded, legible proportions, , broader Islamic worldBooks, documents; basis for modern print (e.g., medieval Qur'ans)
Nastaliq14th–15th centuries onwardFlowing, slanted, rhythmic curvesPersia (), ()Poetry, literature (e.g., Persian divans, Urdu manuscripts)

Regional and language-specific variants

The Arabic script exhibits notable regional variations in letter forms and orthographic conventions, primarily distinguishing between the Western (Maghribi) and Eastern (Mashriqi) traditions. The Maghribi script, prevalent in including , , and , features rounded letterforms with curved vertical strokes for letters such as alif, lam, and ta, along with exaggerated horizontal extensions and open final curves that descend below the baseline. These characteristics, which evolved from early influences softened by sweeping curves, facilitate a fluid, appearance adapted to local traditions and still used in texts today. In contrast, the Eastern or Mashriqi script, dominant in the such as the Gulf states and , employs sharper angles and more angular proportions in letters like and dhal, with less pronounced recurves and a straighter posture that aligns with styles like Naskh. This distinction, attested in medieval sources, reflects geographical and cultural divergences in scribal practices, where Mashriqi forms prioritize precision in angular connections. Language-specific adaptations further diversify the script, particularly in . The , used for Malay in regions like and , modifies the standard Arabic alphabet by adding diacritics to six letters to accommodate Malay phonemes, such as extra dots for sounds absent in . This , introduced by Muslim traders and refined over centuries, incorporates all 31 letters plus six constructed ones, enabling representation of Malay's vowel and inventory while maintaining right-to-left flow. Similarly, the adapts for Javanese in , employing 28 graphemes to denote 23 consonants through minimal modifications, such as added diacritics for Javanese-specific sounds like the ng phoneme, and requiring harakat for s to suit the language's syllabic structure. These tweaks, developed in Islamic scholarly contexts, preserve the script's core while aligning with local phonetics, though Pegon usage has waned with the rise of Latin-based orthographies. Historical variants in the illustrate further evolution and decline. The , a regional style from and surrounding areas, emerged in the 13th century as a blend of angular and cursive elements, featuring elongated horizontals and distinctive loops in letters like and sad. Primarily used for Qur'anic manuscripts, it persisted into the 19th century with over 137 known examples, many dated to the 15th century, but gradually declined following the Mughal promotion of , which standardized more fluid Persian-influenced forms across northern . This shift marked the end of Bihari's prominence, confining it to a niche in pre-modern Islamic textual production.

Usage and Distribution

Current geographical and linguistic use

The Arabic script serves as the official in the 22 member states of the , spanning and the , where it is used for government, education, media, and daily communication. Collectively, these regions are home to approximately 420 million speakers of dialects, making the script essential for and cultural expression among the world's fifth-most . Beyond the , the Arabic script is adapted for several major languages in non-Arab regions, supporting diverse linguistic communities. In and , the Perso-Arabic variant is used for Persian (Farsi) and , with around 80 million speakers, primarily in official documents, literature, and signage. In and , the Nastaʿlīq style of the script writes and for approximately 230 million and 40 million speakers respectively, serving as the national language in administration, education, and print media. In and , a modified Arabic script is used for Sorani Kurdish, supporting about 6-7 million speakers in official and cultural contexts. In , the Jawi variant persists in and , particularly for religious texts, Islamic education, and regional signage in areas like province, where it coexists with . As of 2025, the Arabic script's digital adoption continues to expand in , notably for Uyghur in China's region, where the remains the official standard for government publications, , and signage, reflecting efforts to integrate it into modern technology platforms. This growth underscores the script's resilience in bilingual digital environments despite competing Latin and Cyrillic systems. UNESCO and World Bank data highlight the script's role in literacy across Arabic-script regions, with adult literacy rates in Arab states averaging around 75% as of recent assessments, though variations exist—such as 98% in and 79% in —tied to educational access and script-based instruction. In non-Arab contexts, literacy tied to the script, like in Persian- and Urdu-speaking areas, exceeds 80% in urban centers, supported by widespread schooling in the adapted forms.

Adaptations for non-Arabic languages

The Arabic script has been adapted for various non-Arabic languages by incorporating additional characters or diacritics to represent phonemes absent in , particularly to accommodate Indo-Iranian and Bantu linguistic features. In Persian (Farsi), the script adds four letters—چ (che), پ (pe), ژ (zhe), and گ ()—to denote the sounds /tʃ/, /p/, /ʒ/, and /g/, which are not present in the standard Arabic alphabet. These modifications emerged during the Islamicization of Persia in the 7th–9th centuries, enabling the script to fully represent while retaining the cursive, right-to-left structure. Urdu, an Indo-Aryan language, extends the Perso-Arabic script with four additional letters—ٹ (ṭe), ڈ (ḍāl), ڑ (ṛe), and ڻ (ṇūn)—specifically for retroflex consonants /ʈ/, /ɖ/, /ɽ/, and /ɳ/, which are characteristic of its phonology influenced by Prakrit and Sanskrit substrates. These letters, marked by a nukta (dot) diacritic below the base forms, were standardized in the 19th century during the development of modern Urdu literature in British India. Swahili (also known as Kiswahili), a Bantu , historically employed the Arabic script (known as Ajami) from the onward, adapting it with extra diacritics and notations to capture its five- system and syllable-timed structure, which differ from Arabic's consonant-heavy . This adaptation facilitated religious and trade-related writing along the East African coast until the 1930s, when colonial authorities mandated a shift to the Latin alphabet for standardization and education. The following table illustrates key phoneme-to-grapheme mappings in these adaptations:
LanguagePhonemeGraphemeDescription
Persian/p/پModified bāʾ with three dots above for labial stop.
Persian/tʃ/چModified jīm with three dots above for .
Persian/ʒ/ژModified rāʾ with three dots above for voiced .
Persian/g/گModified kāf with two dots above for voiced velar stop.
/ʈ/ٹṬe: Dental tāʾ with nukta below for retroflex stop.
/ɖ/ڈḌāl: Dental dāl with nukta below for retroflex stop.
/ɽ/ڑṚe: Dental re with nukta below for retroflex flap.
/ɳ/ڻṆūn: Dental nūn with nukta below for retroflex nasal.
Swahili (Ajami)/ɪ/, /ʊ/ِ, ُ (with extensions)Short vowels marked by kasra and damma, often with additional dots or lines for Bantu contrasts.

Historical and discontinued uses

In Central Asia, the Sogdian language and its Aramaic-derived script, widely used from the 4th to 8th centuries CE, began transitioning to the Arabic script following the Islamic conquests of the region in the 8th century, as Sogdian speakers adopted Islam and incorporated Arabic linguistic elements into their writings. This shift marked the integration of Sogdian into the broader Islamic scholarly tradition, with Arabic script facilitating religious and administrative texts among Sogdian communities in areas like Transoxiana. By the early 20th century, under Soviet influence, Arabic-based scripts for Turkic languages in Central Asia, including remnants of Sogdian-influenced systems, were phased out; a Latin alphabet was introduced in the late 1920s, followed by a full replacement with Cyrillic scripts in the late 1930s to promote Russification and literacy in Russian. This discontinuation effectively ended the use of Arabic script for local languages across Soviet Central Asia by the 1940s. In , the Arabic script served as the basis for , which was written in a Perso-Arabic variant from the 14th century until the 1928 alphabet reform under , when it was replaced by a Latin-based script to modernize education and reduce Islamic cultural ties. This reform discontinued the Ottoman script's use in official and literary contexts, leading to widespread literacy campaigns that rendered Arabic-script Turkish obsolete within a decade. Additionally, in the , —a practice of writing like Spanish and Aragonese in Arabic script—emerged among Muslim communities (Mudejars and Moriscos) from the 15th to the 18th centuries, producing literature on religious, poetic, and moral themes to preserve Islamic identity under Christian rule. Following the in 1609–1614 and subsequent cultural suppression, Aljamiado ceased as a living tradition by the early . In Africa, the Ajami script, an adaptation of Arabic for local languages, was employed for Hausa in West Africa and Swahili in East Africa from the 16th century onward, enabling poetry, religious texts, and correspondence among Muslim populations. For Hausa speakers, who numbered in the tens of millions by the 19th century, Ajami facilitated widespread literacy outside formal education, with up to 80% proficiency in some communities. Similarly, Swahili Ajami supported literary epics and Islamic scholarship across coastal and inland regions. European colonial powers in the 19th and 20th centuries imposed Latin-based orthographies—such as "Boko" for Hausa under British rule in Nigeria—viewing Ajami as primitive and a barrier to administrative control, leading to its decline through school curricula, book burnings, and redefinition of literacy to exclude non-Latin systems. By the mid-20th century, Ajami had largely discontinued in favor of colonial scripts for these languages. In , the Perso-Arabic script () for Punjabi, used primarily by Muslim communities, experienced a sharp decline in following the 1947 partition, as mass migrations shifted the Muslim-majority population to , leaving as the dominant script in Indian Punjab. Prior to partition, coexisted with in the region, but post-1947 demographic changes and promotion of for official Punjabi use in marginalized Arabic-script and education. This discontinuation aligned with broader linguistic standardization efforts, reducing 's role in Indian Sikh and Hindu Punjabi contexts to near obsolescence.

Technical Implementation

Unicode encoding and digital representation

The Arabic script is encoded in the Unicode Standard primarily within the Basic Multilingual Plane (BMP), with characters distributed across several dedicated blocks to accommodate core letters, diacritics, variants, and historical forms. The primary Arabic block, designated U+0600–U+06FF, encompasses 256 code points and includes the 28 basic Arabic letters, common diacritical marks such as the kasra (U+064E) and shadda (U+0651), Qur'anic annotation signs, and Arabic-Indic digits from ٠ (U+0660) to ٩ (U+0669). This block forms the foundation for standard orthography and supports right-to-left (RTL) text directionality inherent to the script. For extended letter variants used in specific languages or historical contexts, the Arabic Supplement block (U+0750–U+077F) provides 48 additional code points, introduced in Unicode 4.1 (2005). Examples include forms like Arabic Letter Beh with Three Dots Horizontally Below (U+0750, ݐ) for certain African languages and Arabic Letter Kaf with Two Dots Above (U+077F, ݿ) for historical notations. Further expansions address historical and regional adaptations; notably, Unicode 14.0 (2021) introduced the Arabic Extended-A block (U+08A0–U+08FF) with additional letters and diacritics for languages including African and Caucasian scripts, and the Arabic Extended-B block (U+0870–U+089F) with 48 code points for obsolete or variant forms, such as Arabic Letter Alef with Attached Fatha (U+0870) used in certain non-Arabic orthographies including African languages. These additions up to Unicode 17.0 (2025) ensure comprehensive coverage of historical Arabic-derived scripts without altering core encoding principles. Core Arabic letters are assigned specific code points within the primary block, independent of their contextual forms; for instance, the letter alif is encoded as U+0627 (ا), which represents its isolated form but is rendered variably based on position in a word. Similarly, ba' is U+0628 (ب), and they join according to rules during display. Unicode's design separates logical encoding from visual presentation, relying on font systems and rendering engines to handle joining behaviors. The right-to-left nature of text requires the Bidirectional (UAX #9), which determines display order for mixed directional scripts in documents. This processes text by resolving embedding levels for RTL segments, such as words interspersed with LTR elements like numbers or Latin text, ensuring proper reordering in environments like and CSS via the direction: rtl property and unicode-bidi controls. Brief rendering challenges arise in complex layouts, but these are addressed through standardized font features rather than encoding changes. Unicode's encoding for aligns with the ISO/IEC 10646, which defines the Universal Coded Character Set (UCS) and incorporates all characters, including Arabic blocks, to facilitate global . This harmonization ensures that Arabic script data can be exchanged across systems without loss, with ISO/IEC 10646 specifying the same code points and properties for Arabic as in version 17.0.
Unicode BlockRangeKey ContentsVersion Introduced
U+0600–U+06FFBasic letters (e.g., U+0627 ا alif), diacritics, digits1.0 (1991)
Arabic SupplementU+0750–U+077FLetter variants (e.g., U+0750 ݐ Beh variant)4.1 (2005)
Arabic Extended-BU+0870–U+089FHistorical forms (e.g., U+0870 Alef variant)14.0 (2021)

Challenges in digital typography and rendering

In the early days of digital document creation, particularly with PDF formats before the 2000s, Arabic script faced significant challenges in rendering cursive joining behaviors, often resulting in disjointed letters that disrupted the script's natural connectivity and aesthetic integrity. These failures stemmed from limited support in early typesetting systems, which treated Arabic characters as isolated forms rather than contextually linked glyphs, leading to poor legibility in printed and digital outputs. Solutions emerged through the adoption of OpenType font technology, specifically the Glyph Substitution (GSUB) table, which enables contextual substitutions for initial, medial, final, and isolated forms of Arabic letters to ensure proper cursive connections. By the mid-2000s, GSUB implementations in fonts like those developed for Microsoft Windows allowed for more accurate rendering across applications, marking a pivotal advancement in Arabic digital typography. Despite these improvements, font availability remains a persistent issue, especially for specialized variants like , which is widely used in and Persian contexts but suffers from shortages in high-quality digital implementations due to its complex, slanted structure. This scarcity has historically limited web and in regions where Nastaliq is preferred, often forcing designers to rely on suboptimal Naskh-based alternatives that alter visual authenticity. Projects like Google's , launched in the as part of the broader Noto font family initiative, addressed this gap by providing open-source fonts supporting multiple styles, including Nastaliq variants, to ensure consistent rendering across over 800 languages and eliminate "tofu" placeholders for unsupported characters. Noto's development, in collaboration with Monotype, emphasized comprehensive coverage for scripts, significantly boosting digital adoption in Android and web environments by the late . Input methods for Arabic script continue to pose practical hurdles, particularly with keyboard layouts such as the standard Arabic 101-key configuration, which maps 28 letters plus diacritics to a base but often leads to inefficient typing due to frequent shifts for common characters and inconsistent support across operating systems. On mobile devices, autocorrect systems exacerbate these issues by poorly handling diacritics (harakat), frequently misplacing or omitting vowel marks during , which is critical for precise Quranic or poetic rendering. This results in error-prone input, especially for learners or non-native users, as mobile keyboards like those on and Android struggle with the script's right-to-left directionality and contextual shaping. As of 2025, emerging challenges include the need for AI-driven font generation to revive endangered Arabic styles, such as regional calligraphic variants at risk of , where models are being explored to automate design while preserving cultural nuances. for screen readers remains a key concern, with tools often failing to properly interpret Arabic's cursive joins and diacritics, leading to fragmented audio output that hinders navigation for visually impaired users in web and eBook content. Ongoing W3C efforts highlight gaps in layout requirements, prioritizing solutions like enhanced text-to-speech engines tailored for Arabic script to improve inclusivity in digital interfaces.

Extensions

Additional letters and characters

The Arabic script has been extended beyond its core 28 letters to accommodate phonetic needs in numerous non-Arabic languages, particularly through modifications like added diacritics, new letter forms, and contextual variants encoded in blocks such as Arabic (U+0600–U+06FF), Arabic Extended-A (U+08A0–U+08FF), and Arabic Extended-B (U+0750–U+077F). These extensions enable representation of sounds absent in , such as implosives, retroflexes, and specific vowels, supporting languages from to and . Common additions include letters for Persian, , , and Kurdish, while rarer forms appear in African Ajami scripts for languages like Hausa and Fulfulde. Among the most widely used extensions are those for Indo-Iranian languages. For instance, the Urdu nasalized noon (ں, U+06BA) represents a syllabic nasal /n̩/ or /̃/, essential for words like "kitābẽ" (books). In Kurdish, the open o (ۆ, U+06C6) denotes the vowel /o/, distinguishing it from the standard waw /u/ or /w/, as in Sorani Kurdish orthography. Persian adds peh (پ, U+067E, /p/), cheh (چ, U+0686, /tʃ/), zhe (ژ, U+0698, /ʒ/), and gaf (گ, U+06AF, /ɡ/), which are crucial for native phonemes not present in Arabic. Pashto employs further variants like xwe (ښ, U+069A, /ʂ/) and noong (ږ, U+0696, /ŋ/) to capture retroflex and velar nasal sounds. Rarer extensions are prominent in African Ajami scripts, where the Arabic script was adapted for indigenous languages during Islamic expansion. In Hausa, additional forms like keh with three dots above (ݣ, U+0763) represent labialized /kʷ/ or palatalized /kʲ/, while ghain with three dots above (ࣃ, U+08C3) denotes /ɡʷ/ or /ɡʲ/ in emphatic contexts. For implosives in Hausa and related languages, characters such as beh with above (ࢡ, U+08A1) indicate the implosive bilabial stop /ɓ/, and yeh with two dots below and above (ࢨ, U+08A8) for the glottalized palatal approximant /ʝ/. These Ajami innovations, often using stacked diacritics or modified bases, allow expression of tonal and ejective sounds unique to West and East African phonologies, as seen in Fulfulde and Wolof orthographies. Standardization of these extensions has been advanced by international bodies since the 2010s, particularly through the Unicode Consortium's encoding proposals and the W3C Arabic Layout Task Force, which addresses rendering challenges for diverse variants in digital environments. The , established in 2015, collaborates with linguists to ensure consistent support for over 50 extended characters across browsers and fonts, drawing on input from language communities in and . Efforts like the 2018 Unicode proposal for Hausa-specific letters highlight ongoing work to encode underrepresented Ajami forms without disrupting existing Arabic typography. More recently, (September 2025) added the Arabic Extended-C block (U+10EC0–U+10EFF), introducing 64 characters for additional Qur'anic annotations used in and , as well as letters for the in Indonesian languages. The following table catalogs over 50 representative additional letters and characters, selected from Unicode encodings for key languages. It includes the character glyph (isolated form where possible), Unicode code point, formal name, approximate phoneme(s), and primary language(s) of use. This is not exhaustive but illustrates the diversity of extensions.
CharacterUnicodeNamePhoneme(s)Language(s)
ٱU+0671Arabic Letter Alef Wasla/a/ (elided)Quranic Arabic
ٲU+0672Arabic Letter Alef with Wavy Hamza Above/ʔa/Baluchi, Kashmiri
ٴU+0674Arabic Letter High Hamza/ʔ/ (high)Kazakh, Jawi
ٹU+0679Arabic Letter Tteh/ʈ/Urdu, Sindhi
ٺU+067AArabic Letter Tteh with Small Tah Above/ʈʰ/Sindhi
ٻU+067BArabic Letter Beeh/ɓ/Sindhi
ټU+067CArabic Letter Teh with Ring/ʈ/Pashto
ٽU+067DArabic Letter Teh with Small Tah Above/t̪/Sindhi
پU+067EArabic Letter Peh/p/Persian, Urdu
ٿU+067FArabic Letter Peh with Small Tah Above/pʰ/Sindhi
ݐU+0750Arabic Letter Beh with Three Dots Horizontally Below/ɓ/African languages (e.g., Hausa)
ݑU+0751Arabic Letter Beh with Dot Below and Three Dots Above/bʷ/Hausa
ݒU+0752Arabic Letter Beh with Three Dots Pointing Upwards Below/ɓ/African Ajami
ݓU+0753Arabic Letter Beh with Three Dots Pointing Upwards Below and Two Dots Above/bʲ/African languages
ݔU+0754Arabic Letter Beh with Two Dots Below and Dot Above/ɗ/Saraiki
ݕU+0755Arabic Letter Beh with Inverted Small V Below/ɓ/African Ajami
ݖU+0756Arabic Letter Beh with Small V/v/Shina
ݗU+0757Arabic Letter Hah with Two Dots Above/ħ/African languages
ݘU+0758Arabic Letter Hah with Three Dots Pointing Upwards Below/ɣ/African Ajami
ݙU+0759Arabic Letter Dal with Two Dots Vertically Below and Small Tah/d̪/Saraiki
ݚU+075AArabic Letter Dal with Inverted Small V Below/ɖ/African languages
ݛU+075BArabic Letter Reh with Stroke/ɽ/African Ajami
ݜU+075CArabic Letter Seen with Four Dots Above/s/ (emphatic)Shina
ݝU+075DArabic Letter Ain with Two Dots Above/ʕ/African languages
ݞU+075EArabic Letter Ain with Three Dots Pointing Downwards Above/ʕʷ/African Ajami
ݟU+075FArabic Letter Ain with Two Dots Vertically Above/ʕʲ/African languages
ݠU+0760Arabic Letter Feh with Two Dots Below/v/African Ajami
ݡU+0761Arabic Letter Feh with Three Dots Pointing Upwards Below/ɸ/African languages
ݢU+0762Arabic Letter Keheh with Dot Above/k/Jawi
ݣU+0763Arabic Letter Keheh with Three Dots Above/kʷ/, /kʲ/Hausa, Amazigh
ݤU+0764Arabic Letter Keheh with Three Dots Pointing Upwards Below/q/African Ajami
ݥU+0765Arabic Letter Meem with Dot Above/mʲ/African languages
ݦU+0766Arabic Letter Meem with Dot Below/ɱ/Maba
ݧU+0767Arabic Letter Noon with Two Dots Below/ɲ/Arwi
ݨU+0768Arabic Letter Noon with Small Tah/ɳ/Saraiki
ݩU+0769Arabic Letter Noon with Small V/ɲ/Gojri
ݪU+076AArabic Letter Lam with Bar/ɭ/African languages
ݫU+076BArabic Letter Reh with Two Dots Vertically Above/ɽʒ/Torwali
ݬU+076CArabic Letter Reh with Hamza Above/ʑ/Ormuri
ݭU+076DArabic Letter Seen with Two Dots Vertically Above/ʃ/Kalami
ݮU+076EArabic Letter Hah with Small Arabic Letter Tah Below/χ/Khowar
ݯU+076FArabic Letter Hah with Small Arabic Letter Tah and Two Dots/ʁ/Khowar
ݰU+0770Arabic Letter Seen with Small Arabic Letter Tah and Two Dots/sˤ/Khowar
ݱU+0771Arabic Letter Reh with Small Arabic Letter Tah and Two Dots/ɹˤ/Khowar
ݲU+0772Arabic Letter Hah with Small Arabic Letter Tah Above/ħʷ/Torwali
U+08A0Arabic Letter Beh with Small V Below/bʷ/African languages
U+08A1Arabic Letter Beh with Hamza Above/ɓ/Adamawa Fulfulde
U+08A2Arabic Letter Jeem with Two Dots Above/d͡ʒ/African Ajami
U+08A3Arabic Letter Tah with Two Dots Above/tʰ/African languages
U+08A4Arabic Letter Feh with Dot Below and Three Dots Above/ɸ/African Ajami
U+08A5Arabic Letter Qaf with Dot Below/ɢ/African languages
U+08A6Arabic Letter Lam with Double Bar/ʎ/African Ajami
U+08A8Arabic Letter Yeh with Two Dots Below and Hamza Above/ʝ/Adamawa Fulfulde

Numerals and their evolution

The , also known as the Hindu-Arabic numeral system, originated from the developed in ancient , where a decimal place-value system with nine symbols and a zero was established by the BCE. These Indian numerals reached the Islamic world through and scholarly exchanges, becoming known in regions under Arab influence as early as 662 CE, as recorded by the Syriac scholar Severus Sebokht. In the , the Persian mathematician Muhammad ibn Musa played a pivotal role in their adoption and dissemination by authoring a treatise on Indian calculation methods, titled On the Calculation with Hindu Numerals (c. 825 CE), which explained the system's arithmetic operations and . Although the original Arabic text is lost, a 12th-century Latin translation, Algoritmi de numero Indorum, preserved its content and facilitated the system's spread to . By the , the numeral forms diverged into Eastern Arabic variants (٠١٢٣٤٥٦٧٨٩), used in the eastern Islamic world including the , , and Persia, and Western Arabic variants (closer to modern 0-9), which emerged in the and through local scribal adaptations. This split arose from independent evolutions in handwriting and regional mathematical texts, with the Western forms, often called Gubar numerals after the Arabic word for "dust" (referring to dust-board calculations), gaining prominence in . Regional variants further diversified the system. In the , Western evolved distinctly by the 10th century, featuring rounded shapes like ⓪①②③④⑤⑥⑦⑧⑨ in early manuscripts, and were transmitted to via , influencing the global standard. Persian numerals, a variant of the Eastern Arabic set, differ in digits 4 (۴), 5 (۵), and 6 (۶), reflecting calligraphic influences in and , while maintaining the same positional values. In modern computing, the are encoded in as U+0660–U+0669 (Arabic-Indic digits: ٠ through ٩), supporting right-to-left rendering in Arabic-script languages, with extended variants U+06F0–U+06F9 for Persian and similar forms. This standardization draws from ISO/IEC 8859-6 (), ensuring compatibility in digital and international software, where both Eastern and Western forms coexist for multilingual applications.

Components

Basic graphemes and radicals

The Arabic script is built upon 18 basic graphemes known as basic shapes or letter-shapes, which form the core structures from which the 28 letters of the alphabet are derived by adding diacritical marks such as dots. These fundamental elements originated in early forms of the script and include a variety of strokes that provide the skeletal framework for letter construction. The basic shapes encompass diverse stroke types, including verticals, horizontals, diagonals, and curves, each contributing to the distinctive flow of the script. Vertical strokes, for example, appear as tall, straight lines in letters like alif (ا), serving as standalone elements or stems. Horizontal strokes form baseline extensions, as seen in the final form of nun (ن), where they create a flat, connecting bar. Curves add fluidity, evident in sin (س), which employs rounded arcs to form its serpentine shape. These strokes are executed with consistent pen angles in traditional to maintain proportional harmony. A key basic shape is the dot (nuqṭah), a small circular mark placed above or below base shapes to differentiate homographic letters. Representative examples include the single nuqṭah positioned below the bā’ (ب) to distinguish it from tā’ (ت) and thā’ (ث), and above dāl (د) to set it apart from dhāl (ذ). Other basic shapes include loops, tails, and notches, which combine to form complex shapes while adhering to baseline alignment.
Basic Shape/Stroke TypeDescriptionExample(s)
Vertical strokeStraight downward line, often tall and isolatedAlif (ا)
Horizontal strokeRight-to-left line along the baselineFinal nun (ن)
Curve (open/closed)Arced or looped path for fluiditySin (س), mīm (م)
Diagonal/notchSlanted line or angled cutJīm (ج) initial
Dot (nuqṭah)Small circle for differentiationBā’ (ب) below; dāl (د) above
Tail (returning)Extending curve folding under baselineFinal yā’ (ى)
Bowl/loopRounded enclosure below or above baselineFinal nūn (ن), ṣād (ص)
Non-basic shape elements, such as tashkīl (diacritical vowel marks), are superimposed on these graphemes to guide and are positioned relative to the script's baseline for . The fatḥah (َ, "a" sound) and ḍammah (ُ, "u" sound) sit above the baseline, while the kasrah (ِ, "i" sound) appears below it; the sukūn (ْ, no vowel) is a small circle above. (ّ, ) combines with these above the letter, and tanwīn (indefinite article markers) doubles the marks in their respective positions. These elements do not alter the basic shapes but enhance phonetic clarity, particularly in educational or religious texts.

Letter construction and ligatures

The Arabic script's letters are assembled from a limited set of basic components, often referred to as basic shapes or graphemes, which include fundamental strokes like vertical lines, curves, loops, and diacritical marks such as dots. These elements typically combine in groups of 2 to 5 to form the 28 distinct letters, enabling efficient representation of phonetic variations through minimal additions rather than entirely new shapes. For instance, the letter غ (ghayn) is constructed from the base shape of ع (ʿayn)—a looped curve opening to the right—augmented by three dots stacked above the loop to distinguish its emphatic velar fricative sound. This modular approach stems from the script's historical evolution, where early forms were simplified and differentiated using iʿjam (diacritical dots) to avoid ambiguity in undotted precursors. Similarly, more complex letters incorporate additional basic shapes for tails or extensions. The letter ي (yāʾ) exemplifies this, built from a primary curved stem shape akin to the base of ب (bāʾ), extended by a descending tail curve that sweeps leftward, and finalized with two dots positioned below the stem to indicate its semi-vowel quality. Such constructions ensure that letters maintain cursive compatibility while allowing visual and phonetic distinction; for example, the tail in ي not only aids joining but also evokes the script's flowing aesthetic derived from pen strokes. Overall, the 18 core basic shapes underpin all 28 letters, with dots and marks applied systematically to create the full inventory without requiring unique forms for each. Ligatures in the Arabic script involve the fusion of adjacent letters into unified glyphs, enhancing readability and stylistic harmony in writing. While the script's inherent connectivity makes most joins contextual, true ligatures—where letters transform into a single, indivisible form—are generally optional in modern printed but mandatory in specific cases or traditional styles like naskh and ruqʿah. The most prominent mandatory ligature is لا (lām-alif), which merges ل (lām) and ا (alif) into a distinct upright form to represent the common "la," preventing awkward spacing and improving flow; this is universally applied across fonts and . Beyond this, over 20 common optional ligatures exist, employed for aesthetic or space-saving purposes in particular contexts, such as religious phrases or decorative text. Examples include لله (lām-lām-hā, part of "Allāh") where the two lāms and hā blend into a compact, elevated form, and sequences like شمس (shīn-mīm-sīn) that may fuse in calligraphic styles for smoother transitions. These optional forms vary by script style—more prevalent in traditions than digital print—and are not required for , allowing writers flexibility while preserving the script's interconnected . A key aspect of letter assembly involves stacking diacritics and modifiers, particularly for the hamza (ء), a glottal stop that requires precise positioning on carrier letters to align with phonetic rules and visual balance. The hamza is seated on one of three primary carriers: alif (أ or إ), wāw (ؤ), or yāʾ (ئ), with placement determined by the short vowel associated with the hamza—above alif for fatha (a), below wāw for ḍamma (u), and on the line of yāʾ for kasra (i)—to ensure the carrier's inherent vowel supports the hamza's pronunciation. For initial or standalone positions, hamza appears independently (ء), but in medial or final spots, it stacks above or below the carrier without altering the base letter's joinability. This rule-based stacking prevents overlap and maintains the script's baseline integrity, as seen in words like أَكْل (akl, "eat") where hamza crowns the alif.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.