Hubbry Logo
Data Coding SchemeData Coding SchemeMain
Open search
Data Coding Scheme
Community hub
Data Coding Scheme
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Data Coding Scheme
Data Coding Scheme
from Wikipedia

Data Coding Scheme is a one-octet field in Short Messages (SM) and Cell Broadcast Messages (CB) which carries a basic information how the recipient handset should process the received message. The information includes:

  • the character set or message coding, which determines the encoding of the message user data
  • the message class, which determines to which component of the Mobile Station (MS) or User Equipment (UE) the message should be delivered
  • the request to automatically delete the message after reading
  • the state of flags indicating presence of unread voicemail, fax, e-mail or other messages
  • the indication that the message content is compressed
  • the language of the cell broadcast message

The field is described in 3GPP 23.040 and 3GPP 23.038 under the name TP-DCS.

Message character sets

[edit]

A special 7-bit encoding called the GSM 7 bit default alphabet was designed for the Short Message System in GSM. The alphabet contains the most-often used symbols from most Western-European languages (and some Greek uppercase letters). Some ASCII characters and the Euro sign did not fit into the GSM 7-bit default alphabet and must be encoded using two septets. These characters form GSM 7 bit default alphabet extension table. Support of the GSM 7-bit alphabet is mandatory for GSM handsets and network elements.[1]

Languages which use Latin script, but use characters which are not present in the GSM 7-bit default alphabet, often replace missing characters with diacritic marks with corresponding characters without diacritics, which causes not entirely satisfactory user experience, but is often accepted. In order to include these missing characters the 16-bit UTF-16 (in GSM called UCS-2) encoding may be used at the price of reducing the length of a (non-segmented) message from 160 to 70 characters.

The messages in Chinese, Korean or Japanese languages must be encoded using the UTF-16 character encoding. The same was also true for other languages using non-Latin scripts like Russian, Arabic, Hebrew and various Indian languages. In 3GPP TS 23.038 8.0.0 published in 2008 a new feature, an extended National language shift table was introduced, which in the version 11.0.0 published in 2012 covers Turkish, Spanish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu languages. The mechanism replaces GSM 7-bit default alphabet code table and/or extended table with a national table(s) according to special information elements in User Data Header. The non-segmented message using national language shift table(s) may carry up to 155 (or 153) 7-bit characters.

GSM recognizes only two encodings for text messages and one encoding for binary messages:

  • GSM 7-bit default alphabet (which includes using of National language shift tables as well)
  • UCS-2
  • 8-bit data

Message classes

[edit]

The TP-DCS octet has a complex syntax to allow carrying of other information; the most notable are message classes:

Message Classes
Value Message Class
0 0 0 - Flash messages
0 1 1 - ME-specific
1 0 2 - SIM / USIM specific
1 1 3 - TE-specific

Flash messages are received by a mobile phone even though it has full memory. They are not stored in the phone, they just displayed on the phone display.

Other features

[edit]

Automatic deletion after reading

[edit]

The handset should delete any message received with a TP-DCS value falling to the "Message Marked for Automatic Deletion Coding Group" after user has read it.

Message waiting indication

[edit]

Message Waiting Indication group of DCS values serves to set or reset flags indicating presence of unread voicemail, fax, e-mail or other messages.

Data compression

[edit]

A special DCS value also allows message compression, but it perhaps is not used by any operator.

DCS values

[edit]

SMS data coding scheme

[edit]

The values of TP-DCS are defined in GSM recommendation 03.38.[1]

Coding Group: General Data Coding
DCS
hex dec
Character Set Message Class Compressed Reserved Because
00 0 GSM 7 bit Default -
01 1 GSM 7 bit Default - Bits 1 and 0 have value 1 but no message class present
02 2 GSM 7 bit Default - Bits 1 and 0 have value 2 but no message class present
03 3 GSM 7 bit Default - Bits 1 and 0 have value 3 but no message class present
04 4 8 bit data Default -
05 5 8 bit data Default - Bits 1 and 0 have value 1 but no message class present
06 6 8 bit data Default - Bits 1 and 0 have value 2 but no message class present
07 7 8 bit data Default - Bits 1 and 0 have value 3 but no message class present
08 8 UCS2 Default -
09 9 UCS2 Default - Bits 1 and 0 have value 1 but no message class present
0A 10 UCS2 Default - Bits 1 and 0 have value 2 but no message class present
0B 11 UCS2 Default - Bits 1 and 0 have value 3 but no message class present
0C 12 (reserved) Default - Reserved character set
0D 13 (reserved) Default - Reserved character set

Bits 1 and 0 have value 1 but no message class present

0E 14 (reserved) Default - Reserved character set

Bits 1 and 0 have value 2 but no message class present

0F 15 (reserved) Default - Reserved character set

Bits 1 and 0 have value 3 but no message class present

10 16 GSM 7 bit Class 0 (Flash message) -
11 17 GSM 7 bit Class 1 (ME-specific) -
12 18 GSM 7 bit Class 2 (SIM/USIM-specific) -
13 19 GSM 7 bit Class 3 (TE-specific) -
14 20 8 bit data Class 0 (Flash message) -
15 21 8 bit data Class 1 (ME-specific) -
16 22 8 bit data Class 2 (SIM/USIM-specific) -
17 23 8 bit data Class 3 (TE-specific) -
18 24 UCS2 Class 0 (Flash message) -
19 25 UCS2 Class 1 (ME-specific) -
1A 26 UCS2 Class 2 (SIM/USIM-specific) -
1B 27 UCS2 Class 3 (TE-specific) -
1C 28 (reserved) Class 0 (Flash message) - Reserved character set
1D 29 (reserved) Class 1 (ME-specific) - Reserved character set
1E 30 (reserved) Class 2 (SIM/USIM-specific) - Reserved character set
1F 31 (reserved) Class 3 (TE-specific) - Reserved character set
20 32 GSM 7 bit Default +
21 33 GSM 7 bit Default + Bits 1 and 0 have value 1 but no message class present
22 34 GSM 7 bit Default + Bits 1 and 0 have value 2 but no message class present
23 35 GSM 7 bit Default + Bits 1 and 0 have value 3 but no message class present
24 36 8 bit data Default + Compression set but Character set can't be compressed
25 37 8 bit data Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

26 38 8 bit data Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

27 39 8 bit data Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

28 40 UCS2 Default + Compression set but Character set can't be compressed
29 41 UCS2 Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

2A 42 UCS2 Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

2B 43 UCS2 Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

2C 44 (reserved) Default + Reserved character set
2D 45 (reserved) Default + Reserved character set

Bits 1 and 0 have value 1 but no message class present

2E 46 (reserved) Default + Reserved character set

Bits 1 and 0 have value 1 but no message class present

2F 47 (reserved) Default + Reserved character set

Bits 1 and 0 have value 1 but no message class present

30 48 GSM 7 bit Class 0 (Flash message) +
31 49 GSM 7 bit Class 1 (ME-specific) +
32 50 GSM 7 bit Class 2 (SIM/USIM-specific) +
33 51 GSM 7 bit Class 3 (TE-specific) +
34 52 8 bit data Class 0 (Flash message) + Compression set but Character set can't be compressed
35 53 8 bit data Class 1 (ME-specific) + Compression set but Character set can't be compressed
36 54 8 bit data Class 2 (SIM/USIM-specific) + Compression set but Character set can't be compressed
37 55 8 bit data Class 3 (TE-specific) + Compression set but Character set can't be compressed
38 56 UCS2 Class 0 (Flash message) + Compression set but Character set can't be compressed
39 57 UCS2 Class 1 (ME-specific) + Compression set but Character set can't be compressed
3A 58 UCS2 Class 2 (SIM/USIM-specific) + Compression set but Character set can't be compressed
3B 59 UCS2 Class 3 (TE-specific) + Compression set but Character set can't be compressed
3C 60 (reserved) Class 0 (Flash message) + Reserved character set
3D 61 (reserved) Class 1 (ME-specific) + Reserved character set
3E 62 (reserved) Class 2 (SIM/USIM-specific) + Reserved character set
3F 63 (reserved) Class 3 (TE-specific) + Reserved character set
Coding Group: Message Marked for Automatic Deletion
DCS
hex dec
Character Set Message Class Compressed Reserved Because
40 64 GSM 7 bit Default -
41 65 GSM 7 bit Default - Bits 1 and 0 have value 1 but no message class present
42 66 GSM 7 bit Default - Bits 1 and 0 have value 2 but no message class present
43 67 GSM 7 bit Default - Bits 1 and 0 have value 3 but no message class present
44 68 8 bit data Default -
45 69 8 bit data Default - Bits 1 and 0 have value 1 but no message class present
46 70 8 bit data Default - Bits 1 and 0 have value 2 but no message class present
47 71 8 bit data Default - Bits 1 and 0 have value 3 but no message class present
48 72 UCS2 Default -
49 73 UCS2 Default - Bits 1 and 0 have value 1 but no message class present
4A 74 UCS2 Default - Bits 1 and 0 have value 2 but no message class present
4B 75 UCS2 Default - Bits 1 and 0 have value 3 but no message class present
4C 76 (reserved) Default - Reserved character set
4D 77 (reserved) Default - Reserved character set

Bits 1 and 0 have value 1 but no message class present

4E 78 (reserved) Default - Reserved character set

Bits 1 and 0 have value 2 but no message class present

4F 79 (reserved) Default - Reserved character set

Bits 1 and 0 have value 3 but no message class present

50 80 GSM 7 bit Class 0 (Flash message) -
51 81 GSM 7 bit Class 1 (ME-specific) -
52 82 GSM 7 bit Class 2 (SIM/USIM-specific) -
53 83 GSM 7 bit Class 3 (TE-specific) -
54 84 8 bit data Class 0 (Flash message) -
55 85 8 bit data Class 1 (ME-specific) -
56 86 8 bit data Class 2 (SIM/USIM-specific) -
57 87 8 bit data Class 3 (TE-specific) -
58 88 UCS2 Class 0 (Flash message) -
59 89 UCS2 Class 1 (ME-specific) -
5A 90 UCS2 Class 2 (SIM/USIM-specific) -
5B 91 UCS2 Class 3 (TE-specific) -
5C 92 (reserved) Class 0 (Flash message) - Reserved character set
5D 93 (reserved) Class 1 (ME-specific) - Reserved character set
5E 94 (reserved) Class 2 (SIM/USIM-specific) - Reserved character set
5F 95 (reserved) Class 3 (TE-specific) - Reserved character set
60 96 GSM 7 bit Default +
61 97 GSM 7 bit Default + Bits 1 and 0 have value 1 but no message class present
62 98 GSM 7 bit Default + Bits 1 and 0 have value 2 but no message class present
63 99 GSM 7 bit Default + Bits 1 and 0 have value 3 but no message class present
64 100 8 bit data Default + Compression set but Character set can't be compressed
65 101 8 bit data Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

66 102 8 bit data Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

67 103 8 bit data Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

68 104 UCS2 Default + Compression set but Character set can't be compressed
69 105 UCS2 Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

6A 106 UCS2 Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

6B 107 UCS2 Default + Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

6C 108 (reserved) Default + Reserved character set
6D 109 (reserved) Default + Reserved character set

Bits 1 and 0 have value 1 but no message class present

6E 110 (reserved) Default + Reserved character set

Bits 1 and 0 have value 2 but no message class present

6F 111 (reserved) Default + Reserved character set

Bits 1 and 0 have value 3 but no message class present

70 112 GSM 7 bit Class 0 (Flash message) +
71 113 GSM 7 bit Class 1 (ME-specific) +
72 114 GSM 7 bit Class 2 (SIM/USIM-specific) +
73 115 GSM 7 bit Class 3 (TE-specific) +
74 116 8 bit data Class 0 (Flash message) + Compression set but Character set can't be compressed
75 117 8 bit data Class 1 (ME-specific) + Compression set but Character set can't be compressed
76 118 8 bit data Class 2 (SIM/USIM-specific) + Compression set but Character set can't be compressed
77 119 8 bit data Class 3 (TE-specific) + Compression set but Character set can't be compressed
78 120 UCS2 Class 0 (Flash message) + Compression set but Character set can't be compressed
79 121 UCS2 Class 1 (ME-specific) + Compression set but Character set can't be compressed
7A 122 UCS2 Class 2 (SIM/USIM-specific) + Compression set but Character set can't be compressed
7B 123 UCS2 Class 3 (TE-specific) + Compression set but Character set can't be compressed
7C 124 (reserved) Class 0 (Flash message) + Reserved character set
7D 125 (reserved) Class 1 (ME-specific) + Reserved character set
7E 126 (reserved) Class 2 (SIM/USIM-specific) + Reserved character set
7F 127 (reserved) Class 3 (TE-specific) + Reserved character set
Coding Group: Reserved
DCS
hex dec
Character Set Message Class Compressed Reserved Because
80 128 (not defined) Default - Reserved coding group
... up to ...
BF 191 (not defined) Default - Reserved coding group
Coding Group: Message Waiting Info: Discard Message
DCS
hex dec
Character Set Message Waiting Information Compressed Reserved Because
C0 192 (not defined) Voicemail Inactive -
C1 193 (not defined) Fax Inactive -
C2 194 (not defined) E-mail Inactive -
C3 195 (not defined) Other Inactive -
C4 196 (not defined) Voicemail Inactive - Value of bit 2
C5 197 (not defined) Fax Inactive - Value of bit 2
C6 198 (not defined) E-mail Inactive - Value of bit 2
C7 199 (not defined) Other Inactive - Value of bit 2
C8 200 (not defined) Voicemail Active -
C9 201 (not defined) Fax Active -
CA 202 (not defined) E-mail Active -
CB 203 (not defined) Other Active -
CC 204 (not defined) Voicemail Active - Value of bit 2
CD 205 (not defined) Fax Active - Value of bit 2
CE 206 (not defined) E-mail Active - Value of bit 2
CF 207 (not defined) Other Active - Value of bit 2
Coding Group: Message Waiting Info: Store Message
DCS
hex dec
Character Set Message Waiting Information Compressed Reserved Because
D0 208 GSM 7 bit Voicemail Inactive -
D1 209 GSM 7 bit Fax Inactive -
D2 210 GSM 7 bit E-mail Inactive -
D3 211 GSM 7 bit Other Inactive -
D4 212 GSM 7 bit Voicemail Inactive - Value of bit 2
D5 213 GSM 7 bit Fax Inactive - Value of bit 2
D6 214 GSM 7 bit E-mail Inactive - Value of bit 2
D7 215 GSM 7 bit Other Inactive - Value of bit 2
D8 216 GSM 7 bit Voicemail Active -
D9 217 GSM 7 bit Fax Active -
DA 218 GSM 7 bit E-mail Active -
DB 219 GSM 7 bit Other Active -
DC 220 GSM 7 bit Voicemail Active - Value of bit 2
DD 221 GSM 7 bit Fax Active - Value of bit 2
DE 222 GSM 7 bit E-mail Active - Value of bit 2
DF 223 GSM 7 bit Other Active - Value of bit 2
E0 224 UCS2 Voicemail Inactive -
E1 225 UCS2 Fax Inactive -
E2 226 UCS2 E-mail Inactive -
E3 227 UCS2 Other Inactive -
E4 228 UCS2 Voicemail Inactive - Value of bit 2
E5 229 UCS2 Fax Inactive - Value of bit 2
E6 230 UCS2 E-mail Inactive - Value of bit 2
E7 231 UCS2 Other Inactive - Value of bit 2
E8 232 UCS2 Voicemail Active -
E9 233 UCS2 Fax Active -
EA 234 UCS2 E-mail Active -
EB 235 UCS2 Other Active -
EC 236 UCS2 Voicemail Active - Value of bit 2
ED 237 UCS2 Fax Active - Value of bit 2
EE 238 UCS2 E-mail Active - Value of bit 2
EF 239 UCS2 Other Active - Value of bit 2
Coding Group: Data Coding/Message Class
DCS
hex dec
Character Set Message Class Compressed Reserved Because
F0 240 GSM 7 bit Class 0 (Flash message) -
F1 241 GSM 7 bit Class 1 (ME-specific) -
F2 242 GSM 7 bit Class 2 (SIM/USIM-specific) -
F3 243 GSM 7 bit Class 3 (TE-specific) -
F4 244 8 bit data Class 0 (Flash message) -
F5 245 8 bit data Class 1 (ME-specific) -
F6 246 8 bit data Class 2 (SIM/USIM-specific) -
F7 247 8 bit data Class 3 (TE-specific) -
F8 248 GSM 7 bit Class 0 (Flash message) - Value of bit 3
F9 249 GSM 7 bit Class 1 (ME-specific) - Value of bit 3
FA 250 GSM 7 bit Class 2 (SIM/USIM-specific) - Value of bit 3
FB 251 GSM 7 bit Class 3 (TE-specific) - Value of bit 3
FC 252 8 bit data Class 0 (Flash message) - Value of bit 3
FD 253 8 bit data Class 1 (ME-specific) - Value of bit 3
FE 254 8 bit data Class 2 (SIM/USIM-specific) - Value of bit 3
FF 255 8 bit data Class 3 (TE-specific) - Value of bit 3

iDEN mobile standard uses values F716 and F816 in a special way.

CBS data coding scheme

[edit]

For the DCS values in Cell Broadcast Messages see GSM recommendation 03.38.[1]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A Data Coding Scheme (DCS) is a one-octet field defined in the Short Message Service (SMS) protocol for , , LTE, and other mobile cellular networks, which specifies the encoding format of the message user data field (TP-UD) and may indicate additional attributes such as message class or compression. This field ensures that recipient devices correctly interpret the content of short messages and messages by signaling the character set, such as GSM 7-bit default alphabet (supporting up to 160 characters), 8-bit binary data (up to 140 octets), or UCS2 (up to 70 characters). Standardized in Technical Specification TS 23.038, the DCS plays a critical role in across mobile networks by preventing garbled text or during transmission. The structure of the DCS octet divides into two 4-bit : the most significant nibble (bits 7-4) identifies the coding group, while the least significant nibble (bits 3-0) provides group-specific details like character set indicators or classes. For instance, coding group 00 (binary 00xx) is used for general data coding, where bit 5 signals compression (0 for uncompressed, 1 for compressed per TS 23.042), bits 3-2 select the character set (00 for 7-bit, 01 for 8-bit data, 10 for UCS2, and 11 reserved), and bits 1-0 denote the class if bit 4 is set (e.g., 00 for Class 0 immediate display, 01 for Class 1 (ME storage)). Other groups handle specialized cases, such as 01xx for messages marked for automatic deletion, 11xx for waiting indications (MWI), and reserved values that default to 7-bit encoding to maintain . In practice, the DCS enables efficient delivery in protocols like (Short Message Peer-to-Peer), where it informs gateways and handsets about text encoding to support international characters or binary payloads like ringtones and logos. Common values include DCS 0x00 for plain 7-bit text without class specification and 0x08 for UCS2-encoded messages, ensuring broad support across diverse devices while adhering to the 140-octet limit (adjusted for DCS and other headers). Evolving with mobile standards, the DCS remains essential for legacy and modern applications, including over-the-air updates and multimedia messaging precursors.

Fundamentals

Definition and Purpose

The Data Coding Scheme (DCS) is an 8-bit field within the Transfer Protocol Data Unit (TPDU) of Short Message Service (SMS) protocols, used to specify the encoding of user data, including character sets, languages, and message handling attributes. This field, denoted as TP-Data-Coding-Scheme (TP-DCS), appears in key TPDUs such as SMS-DELIVER, SMS-SUBMIT, and SMS-STATUS-REPORT, providing essential metadata for message interpretation without requiring prior between sending and receiving entities. The primary purpose of DCS is to enable receiving devices, such as mobile stations (MS), to correctly decode and render message content, supporting features like multilingual text display and specific delivery behaviors, including message classes and compression status. By indicating the alphabet used—such as the GSM 7-bit default or UCS2—it ensures accurate processing of diverse data types, from to binary content, while facilitating in heterogeneous networks. This mechanism also allows for enhanced functionalities, such as immediate display for flash messages or storage for normal ones, optimizing user experience across global mobile systems. Introduced in the standard (now 3GPP TS 23.038) for alphabets and data coding during the 1990s by the European Telecommunications Standards Institute (ETSI), with the TP-DCS field specified in the contemporaneous standard, DCS addressed the need for standardized handling as mobile networks expanded internationally. Over time, it evolved to accommodate globalization demands, incorporating support for additional encodings and features such as the Enhanced Messaging Service (EMS), initially introduced in 3GPP Release 99 with enhancements in Releases 4 and 5, while preserving with early implementations. Key benefits of DCS include promoting seamless between diverse devices and carriers, thereby reducing errors like garbled text from encoding mismatches, and enabling efficient resource use through options like data compression. These attributes have been instrumental in the widespread adoption of , supporting billions of daily messages worldwide by ensuring reliable, context-aware delivery.

Bit Structure and Encoding

The Data Coding Scheme (DCS) is an 8-bit field used in Short Message Service () protocols to specify the encoding and handling attributes of the message user data. The octet is structured with bits numbered 7 (most significant) to 0 (least significant), where bits 7 to 4 define the primary coding group, and bits 3 to 0 provide group-specific details such as character set, message class, or indications. This division allows for up to 16 possible values in the default coding group (hexadecimal 00 to 0F), with higher ranges reserved for specialized or extended encodings. In the default coding group (bits 7 to 6 set to 00), bit 5 indicates whether compression is applied (0 for uncompressed, 1 for compressed), bit 4 signals the presence of a message class (0 for absent, 1 for present), bits 3 to 2 specify the character set encoding (00 for 7-bit default , 01 for 8-bit , 10 for UCS2 16-bit, 11 reserved), and bits 1 to 0 denote the message class when bit 4 is 1 (00 for Class 0, 01 for Class 1 ME-specific, 10 for Class 2 SIM-specific, 11 for Class 3 TE-specific). Bit 7 serves as a high-level group indicator (0 for general or default groups, 1 for extended groups like message waiting indications), while bits 6 to 4 further delineate subgroups within those categories, such as 000 for basic 7-bit, 001 for 8-bit binary, or 010 for UCS2. For groups like message waiting indication (bits 7 to 4 as 1100 or 1101), bits 3 to 0 shift to indication status and type (e.g., bit 3 for active/inactive, bits 1 to 0 for , , , or other). The encoding rules combine these bits to form standardized values, ensuring compatibility across implementations; for instance, values 00 to 0F (binary 0000xxxx) exclusively use the general coding structure, defaulting to the 7-bit alphabet if unspecified. Reserved ranges (e.g., bits 7 to 4 as 1000 to 1011 or 1111 with specific sub-bits) allow for future extensions or alternative codings like UCS2-based indications, but receiving devices treat undefined values by falling back to the default 7-bit encoding. A common example is DCS value 0x00 (binary 00000000), which denotes uncompressed using the 7-bit default alphabet with no message class or additional indications.
Coding Group (Bits 7-4)Bit 5 MeaningBit 4 MeaningBits 3-2 MeaningBits 1-0 MeaningExample Hex Value
00xx (General )Compression (0/1)Class present (0/1)Character set (00= 7-bit, 01=8-bit, 10=UCS2)Class (if bit 4=1: 00=Class 0, etc.)0x00 ( 7-bit, no class)
01xx (Auto Deletion)Same as 00xxSame as 00xxSame as 00xxSame as 00xx0x45 (8-bit, Class 1)
1100 (MWI Discard)N/AN/AIndication active (0/1, bit 3)Type (00=, etc.)0xC0 (Inactive voicemail)
1101 (MWI Store, 7-bit)N/AN/ASame as 1100Same as 11000xD9 (Active )
1110 (MWI Store, UCS2)N/AN/ASame as 1100Same as 11000xEA (Active )
1111 (/Class)N/AN/AReserved (0, bit 3)Class (00=Class 0, 01=Class 1, 10=Class 2, 11=Class 3)0xF0 ( 7-bit, Class 0)
This table illustrates the bit allocations for key groups, highlighting how the structure supports varied encoding without overlapping functionalities.

Character Sets and Message Classes

Supported Character Sets

The Data Coding Scheme (DCS) specifies multiple options to accommodate diverse text representations in short services, primarily through bit combinations in an 8-bit field that indicate the or used for the user data. These encodings balance efficiency, capacity, and multilingual support, with selections made via bits 3 and 2 in the general coding group (bits 7-4 set to 0000). The 7-bit default alphabet serves as the primary encoding for text , supporting a 128-character set optimized for Western European languages, including basic Latin letters, digits, and common symbols such as and signs. Characters are packed into 7-bit units within 8-bit octets, allowing a maximum of 160 characters per short message while maintaining compatibility across networks; an extension table enables access to additional symbols like curly braces, but each use reduces the effective displayable characters by one due to the escape mechanism. This encoding is mandatory for all mobile stations (MS) and service centers (SC). For non-textual content, the 8-bit binary or data coding treats the message as an unstructured octet stream, suitable for applications like downloads, operator logos, or other binary payloads, with a capacity of up to 140 octets per message. This scheme provides flexibility for user-defined data without imposing a specific character interpretation, though it requires explicit handling by receiving devices. UCS2, a 16-bit encoding based on the standard, enables support for a broader range of international scripts, including non-Latin languages such as , Hebrew, Chinese, and Cyrillic, by representing each character with two octets. This results in a reduced capacity of 70 characters per message but facilitates global interoperability for multilingual content. UCS2 is selected when bits 3-2 are set to 10 in the general coding group. Reserved coding combinations, particularly when bits 3-2 are set to 11 in the general group, default to the 7-bit alphabet for , ensuring that unsupported schemes do not disrupt basic text transmission. For national language variants, such as Turkish or , the DCS incorporates shift mechanisms within the 7-bit encoding: single-shift tables add specific characters to the default alphabet, while locking-shift replaces it entirely with a language-specific set using escape characters (e.g., 0x1B for single shift, 0x1D for locking shift). In practice, national language variants such as Turkish or are supported by invoking language-specific single-shift or locking-shift tables via escape characters within the 7-bit default alphabet, allowing addition or replacement of characters without changing the DCS coding group. These features address regional character needs without requiring full overhead. The specification has evolved through multiple 3GPP releases, with the latest version 19.0.0 (October 2025, Release 19) maintaining core character set definitions while incorporating updates to language-specific information as needed.
Coding Group (Bits 7-4)Bits 3-2 Character SetDescription
0000 (General)00GSM 7-bit default alphabet
0000 (General)018-bit binary/data
0000 (General)10UCS2 (16-bit Unicode)
0000 (General)11Reserved (defaults to GSM 7-bit)
National language variants (e.g., Turkish, ) use locking/single-shift tables invoked via escape characters in 7-bit messages, without dedicated coding groups.

Message Class Types

The Data Coding Scheme (DCS) in SMS messaging categorizes messages into four distinct classes to determine their handling, storage, and presentation by the receiving device. These classes are indicated within the TP-DCS octet, allowing the (SMSC) or originating entity to specify how the message should be processed upon delivery. Message classes are encoded using bits 1 and 0 of the DCS octet when bit 4 is set to 1 (indicating a class is present), with bits 3-2 reserved (set to 00 by sender), within coding group 0000 (bits 7-4 = 0000) for uncompressed text. The encoding overrides any default behavior, with the following bit patterns for bits 1-0: 00 for Class 0, 01 for Class 1, 10 for Class 2, and 11 for Class 3. This mechanism ensures compatibility across different message alphabets while prioritizing the specified storage and display rules.
ClassBit Pattern (Bits 1-0)DescriptionStorage and Handling
000Immediate display (flash message)Displayed directly on the device screen without user interaction; not stored in any memory.
101Mobile Equipment (ME)-specificStored in the device's ME memory; accessible via the and potentially the SIM if configured.
210SIM-specificStored exclusively on the SIM or USIM card; not accessible through the standard user interface.
311Terminal Equipment (TE)-specificForwarded to an external TE, such as a connected computer or peripheral, for storage and processing.
Class 0 messages are typically used for urgent, transient notifications like network alerts or emergency broadcasts, where immediate visibility is critical but persistence is unnecessary. Class 1 serves as the default for standard user-to-user exchanges, enabling typical storage and retrieval in the device's message inbox. Class 2 is reserved for SIM toolkit applications, such as downloading configuration data or applets that interact directly with the SIM for services like or operator provisioning. Class 3 facilitates integration with external systems, often in enterprise or IoT scenarios where messages need to be routed beyond the . These classes integrate with the selected character set in the DCS to ensure proper decoding before applying the handling rules. The message class framework was originally defined in and has been carried forward without alteration in subsequent releases, including those supporting Non-Stand Alone (NSA) and Stand Alone (SA) architectures via over IP.

Indication and Control Features

Message Waiting Indication

Message Waiting Indication (MWI) is a feature within the Data Coding Scheme (DCS) that enables short message service centers to notify mobile devices of pending messages, such as voicemails or faxes, through dedicated messages. This mechanism uses specific DCS values to signal the presence or absence of waiting messages, allowing the receiving mobile equipment (ME) to display appropriate indicators like icons or play alert sounds without delivering full message content. Introduced as part of the ETSI specification in its version 5.1.0 released in March 1996, MWI has been a standard component of protocols for legacy circuit-switched networks. The DCS octet for MWI belongs to the indication group, where bits 7 to 4 are set to 1100 for discard mode (message processed but not stored) or 1101 for store mode (message stored in 7-bit alphabet, with an additional 1110 option for UCS2 storage). Bit 3 indicates the sense (1 for active/waiting, 0 for inactive/no waiting), bit 2 is (set to 0), and bits 1 to 0 specify the type: 00 for message waiting, 01 for , 10 for electronic , or 11 for other. For the "other" category, up to 14 additional subtypes can be distinguished via associated SIM/USIM storage records (EF_MWIS), enabling support for diverse services like short message waiting or alerts. Upon receipt, the ME interprets the DCS to update the device's visual or audible indicators and stores the status in the SIM/USIM's Message Waiting Indication Status (MWIS) elementary file, regardless of available memory; the originating address may also be retained if supported. For specifically, the basic DCS MWI sets the active/inactive flag, while new and deleted message counters are updated via separate messages employing the Special SMS Message Indication in the user data field, which conveys numerical counts and timestamps. This layered approach allows precise status reporting without overloading the core indication . The receiving device acknowledges MWI irrespective of storage capacity, ensuring reliable delivery of alerts. Although widely implemented in and networks, MWI support is not universal, as it remains an optional feature dependent on network operator and device capabilities. In modern IP-based messaging systems like (RCS), which leverage the , traditional SMS-based MWI has been largely supplanted by SIP-based protocols defined in TS 24.606 since Release 7 (2006), with ongoing enhancements through 2023 standards favoring push notifications and integrated indicators over DCS signaling.

Automatic Deletion Mechanisms

In the SMS Data Coding Scheme, the automatic deletion mechanism allows the short message (SM) originator to mark messages for automatic removal from the mobile equipment (ME) or universal subscriber identity module ((U)SIM) after the recipient has read them. This feature ensures temporary storage without permanent retention, enhancing or storage management in mobile networks. It was introduced in Release 4 via change request TP-000074 to support automatic removal of read SMS messages. The mechanism is encoded in the TP-Data-Coding-Scheme (TP-DCS) field of the (TPDU), where bits 7-4 are set to 01xx (coding group for messages marked for automatic deletion), with bits 5-0 following the same coding as the general data coding group (00xx), defining elements like character set, data compression, and message class. This coding applies irrespective of the message class, ensuring consistent behavior across SMS types stored in the ME or (U)SIM. Upon delivery and reading, the receiving ME must delete the message without intervention from the end user or any targeted application, with the deletion process being manufacturer-specific. Mobile equipment manufacturers may optionally implement a user-accessible setting to prevent this automatic deletion, allowing recipients to retain flagged messages if needed. Unlike SMS, the Cell Broadcast Service (CBS) Data Coding Scheme does not include an equivalent automatic deletion group, as CBS messages are transient broadcasts without individual storage semantics. The bit structure for the TP-DCS field in the automatic deletion group is as follows:
BitsFunctionValues
7-4Coding Group01xx (Message marked for automatic deletion)
5Data Compression0 = Uncompressed; 1 = Compressed (per TS 23.042)
4Message Class Indication0 = No message class; 1 = Message class present (bits 1-0 used)
3-2Character Set / Indication00 = GSM 7-bit default alphabet; 01 = 8-bit data; 10 = UCS2 (16-bit); 11 = Reserved
1-0Message Class (if bit 4=1)00 = Class 0 (immediate display); 01 = Class 1 (ME-specific); 10 = Class 2 ((U)SIM-specific); 11 = Class 3 (TE-specific)
This structure maintains compatibility with standard SMS encoding while adding the deletion flag.

Advanced Features

Data Compression Options

The Data Coding Scheme (DCS) incorporates compression to optimize message efficiency, primarily for text-based content in and related services. In the General Data Coding group (bits 7-4=00xx), bit 5 set to 1 indicates that the consists of compressed 7-bit, 8-bit, or UCS2 data, signaling the receiving device to apply decompression using the specified algorithm. This flag ensures compatibility with character sets like UCS2, which are eligible for compression to mitigate their higher bit requirements compared to 7-bit encoding. For example, compressed UCS2 without message class uses DCS=0x28. The standardized compression method relies on a scheme, which assigns shorter variable-length bit sequences to more frequent characters, achieving efficient reduction in payload size. Additional optional mechanisms include character group transitions (SEGS) for handling language-specific sets and dynamic tree updates via techniques like weight swapping () to adapt to message content without initial training. For UCS2 data, the algorithm optimizes by signaling row changes only when necessary, effectively reducing the average encoding from 16 bits per character to approximately 10 bits per character in typical scenarios. This compression directly impacts message capacity within the 140-octet limit of the TP-User Data (TP-UD) field for a single SMS segment, allocating up to 133 octets for the compressed payload after 7 octets of overhead for the compression header and footer. Consequently, compressed UCS2 messages can accommodate approximately 100-110 characters in typical scenarios, an increase over the uncompressed limit of 70 characters depending on text compressibility. Despite these benefits, adoption of DCS compression remains limited, as it is an optional feature requiring consistent across networks and devices, which has proven complex due to variable decompression needs. Usage has further declined post-2010 with the proliferation of MMS and RCS alternatives that natively support extended message lengths without such optimizations.

User-Defined Extensions

The Data Coding Scheme (DCS) octet in and messages allocates certain values as reserved, enabling vendors and network operators to implement proprietary or user-defined extensions for custom features. According to TS 23.038, reserved coding groups (e.g., bits 7-4=10xx) are treated by receiving entities as the General Data Coding group (00xx) for fallback compatibility, while 8-bit data coding inherently allows user-defined binary payloads. Specific bit combinations marked as reserved support implementations like over-the-air (OTA) configuration messages or hints for enhanced content handling, provided they adhere to fallback behaviors such as defaulting to 7-bit encoding. Mobile stations (MS) must store messages with reserved or unsupported DCS values without attempting to interpret them, treating them as opaque data, while service centers (SC) are permitted to reject such messages to maintain network integrity. This design supports innovation in vendor-specific applications, such as proprietary signaling for device configurations, but the standard explicitly discourages overuse of reserved values to prevent fragmentation. Interoperability risks arise when devices lack support for custom schemes, potentially leading to message loss or incorrect rendering; thus, guidelines recommend defaulting to the GSM 7-bit alphabet (DCS 0x00) for broad compatibility. For instance, in binary SMS formats common for OTA settings, 8-bit coding (e.g., DCS=0x04 or 0xF6 for class 2) may signal vendor-specific protocols, ensuring delivery to class-specific storage (e.g., for class 2) if the recipient recognizes them.

Specific Implementations

SMS Data Coding Scheme

The SMS Data Coding Scheme (DCS) plays a central role in the Short Message Service () by specifying the encoding, alphabet, and handling instructions for the user data within short messages. In the SMS protocol, the TP-DCS (TP-Data-Coding-Scheme) field, an optional 8-bit parameter in the Transfer Protocol Data Unit (TPDU), carries this scheme and informs the receiving entity how to interpret the TP-UD (TP-User Data) field. This field determines the length of the user data and the packing method applied, ensuring compatibility between sender and receiver in terms of character representation and message processing. The TP-DCS directly influences the maximum allowable user data length in SMS messages. For the GSM 7-bit default alphabet, messages support up to 160 characters, packed into 140 octets to optimize bandwidth on / networks. In contrast, 8-bit data coding allows up to 140 octets of , suitable for non-text applications, while UCS2 (16-bit) coding limits messages to 70 characters, occupying 140 octets due to the doubled byte size per character. These limits apply to single-part messages and guide the segmentation in multi-part transmissions. During message submission, the originating (MS) or application sets the TP-DCS value in the TPDU before forwarding it to the (SMSC) via the mobile-originated (MO) procedure. The SMSC stores the message and, upon delivery to the destination MS, includes the unchanged TP-DCS in the SMS-DELIVER TPDU, enabling the receiving MS to decode the user data accordingly. For concatenated long messages exceeding single-part limits, each segment carries the same TP-DCS value, with concatenation information provided in the (UDH) to reassemble the full message at the receiver. This uniform DCS application across parts ensures consistent decoding. The core specifications for SMS DCS are outlined in 3GPP TS 23.040, with version 18.0.0 (released May 2024) incorporating enhancements for modern networks. Notably, this includes support for SMS delivery over the Non-Access Stratum () protocol, extending DCS applicability to IP-based and New Radio (NR) environments while maintaining with earlier releases. These updates address evolving requirements for SMS in diverse access technologies without altering the fundamental TP-DCS mechanics.

CBS Data Coding Scheme

The Cell Broadcast Service (CBS) utilizes the Data Coding Scheme (DCS) as a one-octet field in the message header to specify the character set, coding method, and for broadcast messages delivered simultaneously to all mobile stations within a geographic cell or group of cells. This enables efficient, unacknowledged dissemination of short text content, such as public announcements or alerts, without the point-to-point delivery model of . In CBS, the DCS is positioned at octet 5 in the format or octet 6 in , LTE, and formats, directly influencing how receiving devices decode and display the message. Like , DCS supports 7-bit default alphabet, 8-bit binary data, and UCS2 (16-bit ) encodings, though it excludes certain SMS-specific features such as message classes and user indications in most coding groups to optimize for broadcast efficiency. This focuses on text-based and binary content with a fixed structure: each page consists of 82 octets, accommodating up to 93 characters in 7-bit mode, up to 82 octets in 8-bit mode, or 41 characters in UCS2, with messages spanning up to 15 pages for a total of 1,230 octets. The DCS bits 7-4 denote the coding group (e.g., 0000 for general coding), while bits 3-0 indicate or handling, such as 0000 for German or 1111 for unspecified, ensuring appropriate rendering without interactive features like acknowledgments or deletions. For example, DCS value 0x00 (binary 00000000) specifies 7-bit default alphabet in German, commonly used for warning messages in regions requiring that . The DCS plays a critical role in emergency alert systems integrated with , such as the Earthquake and Tsunami Warning System (ETWS), where it defines coding for primary and secondary notifications to ensure rapid, language-appropriate delivery without device-specific filtering for mandatory alerts. This is standardized in 3GPP TS 23.041 (version 18.7.0, October 2025) for architecture and TS 23.038 (version 18.0.0, May 2024) for DCS details, with implementations in the U.S. Commercial Mobile Alert System (CMAS) and Europe's relying on these schemes for geo-targeted warnings using predefined message identifiers (e.g., 4370-4371 for ETWS). In networks, Release 18 (as of 2025) builds on Release 17's evolved Public Warning System (ePWS) to enhance DCS support for language-independent content, such as Unicode pictograms, while multimedia broadcasting is addressed separately through Multicast-Broadcast services rather than extending traditional text limits.
Add your contribution
Related Hubs
User Avatar
No comments yet.